Back to blog

Best Tools for Deepfake and Hybrid Attack Simulation Training

Research

Research

Written by

Brightside Team

Published on

A finance manager gets an email from what appears to be the CFO. The message is brief: there's a time-sensitive vendor payment that needs processing, and a call will follow shortly to walk through the details. Ten minutes later, the phone rings. The voice on the other end sounds exactly like the CFO, knows the reference number from the email, and explains the urgency in the calm, authoritative tone that's always marked that relationship. The manager processes the payment.

No malware was installed, no link was clicked, and nothing triggered an alert. The money was gone anyway.

A hybrid deepfake attack is a coordinated social engineering attack that uses a phishing email to establish context and credibility, followed by a live AI-generated voice or video call that pushes the target to act. Neither channel fully carries the attack on its own. The email doesn't need to be convincing enough to prompt immediate action. It only needs to make the call feel expected. And the call doesn't need to arrive out of nowhere and hope the target complies. It arrives as a confirmation of something the target is already thinking about.

TL;DR

  • A hybrid deepfake attack combines a phishing email with an AI-generated voice or video follow-up, where each channel makes the other more believable

  • The email's job is not to convince the target on its own. It's to prepare them for the call

  • This two-step structure is more dangerous than either channel alone because it removes the two defenses that typically protect employees: skepticism of unexpected messages, and reluctance to act without confirmation

  • Standard phishing simulations miss this risk entirely because they test one channel in isolation

  • Only a small number of platforms can generate a phishing email and a live AI vishing call as a single coordinated simulation campaign

What Is a Hybrid Deepfake Attack?

Hybrid deepfake attacks are not sophisticated in the way that technical exploits are sophisticated. There's no zero-day, no lateral movement, no custom malware. The complexity is entirely social, and that's what makes them so hard to train against using conventional methods.

The attack pattern works in two layers. First, a phishing email arrives. It's personalized, addressed by name, referencing an action that fits the target's job: approving a payment, verifying a vendor, handling an urgent access request. It tells the target to expect a follow-up call. Second, the call arrives. It uses a cloned voice or deepfake video impersonating someone the target knows and trusts, typically a senior executive. The caller references the email, adds urgency, and guides the target toward the desired action before they have time to apply independent judgment.

This is different from ordinary phishing, which depends entirely on one message being convincing enough. It's different from ordinary vishing, where the call arrives cold and the target can reasonably ask "who is this and why are you calling me?" It's also different from traditional business email compromise, where the attacker relies only on text. In a hybrid attack, the email and the call are two parts of a single designed interaction, each one validating the other.

The email tells the target what to expect. The call confirms it. Together, they dismantle the two defenses that protect employees when each channel operates alone.

How a Hybrid Deepfake Attack Works Step by Step

Understanding the mechanics helps explain why employees fail even when they know they should be careful.

1. Reconnaissance. Before the attack begins, the attacker researches the target using publicly available information: their LinkedIn profile, the company's website, executive bios, press releases, published earnings calls, and any audio or video of the executive they plan to impersonate. Most organizations publish more than enough material for a capable attacker to work with.

2. Email setup. The phishing email arrives first. It's addressed to the target by name, comes from a spoofed or compromised-looking executive address, and references a plausible business matter. Critically, it mentions that the executive will call shortly to discuss. This single line does most of the psychological work.

3. Trust priming. In the time between the email and the call, something shifts. What would otherwise be an unsolicited phone call from someone claiming to be the CFO is now a call the target is actively waiting for. Surprise is removed. The target's mental frame has already been set.

4. The AI call. A live AI agent places the call, generating the executive's voice in real time rather than from a recording. It references the email, uses the target's name, confirms the context, and applies the social pressure pattern that makes the scenario feel urgent but legitimate.

5. Pressure. The attacker uses authority and urgency in combination. The tone is confident, the request is framed as already approved at a higher level, and the time window is short. These tactics compress the target's decision window in exactly the way real attackers design them to.

6. Action. The target complies before they've had a chance to pause and verify. By the time they think to call the real CFO through a separate channel, the transaction is already done.

Why Hybrid Deepfake Attacks Are Harder to Spot Than Regular Phishing

The voice sounding real is the obvious explanation. It's also the incomplete one.

The deeper problem is structural. By the time the call arrives, the target is not evaluating a cold request from an unknown source. They're responding to what feels like a confirmation from someone they already expect to hear from. Research on social compliance shows that people who have already received a message are significantly less likely to question a follow-up that references it. The prior email doesn't just set context. It lowers resistance to the call that follows.

Compound that with voice familiarity. Hearing a voice that sounds like a trusted executive activates the same social and professional compliance instincts that govern real workplace interactions. The target isn't thinking "is this a deepfake?" They're thinking "my CFO needs this and she sounds stressed."

The honest reality is that asking employees to detect synthetic audio in real time is not a realistic defense. Studies confirm that humans cannot reliably distinguish AI-generated speech from real speech under normal conditions, and performance is worse under time pressure. The goal of training should not be teaching employees to spot something they statistically cannot spot. It should be teaching them to pause, verify through an independent channel, and report anything that feels like it's pushing them to act fast.

Real-World Cases Security Teams Should Study

Documented hybrid deepfake cases are less common in public reporting than their actual frequency suggests, partly because organizations don't always disclose them and partly because attribution takes time. But the incidents that have been reported show how consistently the two-channel pattern appears.

In a fraud case involving a UAE bank, a finance employee reportedly received an email referencing an urgent transaction, followed by a voice-cloned call that sounded like a trusted executive. The call referenced the email directly. The employee approved the transfer. The key detail is not the quality of the voice clone. It's that the attack was designed so the call didn't need to stand on its own.

WPP disclosed that its CEO was targeted by attackers who used a voice clone and a fake meeting-style interaction to instruct staff to share credentials and move money. The attack combined digital access, a fabricated identity, and real-time voice synthesis. The voice clone was not the whole attack. The context around it was.

The trajectory is clear going back further. In 2019, a UK energy company executive transferred $243,000 after receiving a call from someone using a convincing AI voice clone of his German parent company's CEO, complete with accent and speech patterns. That was a voice-only attack. Today's hybrid attacks add an email layer that makes the same dynamic even harder to interrupt.

Attackers are moving from one fake message to coordinated fake conversations.

Which Employees Are Most Likely to Be Targeted First

Attackers do not target randomly. They look for employees whose role gives them the ability to take the action being requested, and whose workflow includes responding to executive-level communications without always having time to independently verify.

Finance and AP teams are the primary target in most documented fraud cases because they can authorize payments and change vendor bank details with limited secondary oversight. The volume of urgent executive requests they routinely handle makes a fabricated one harder to notice.

Executive assistants present a different kind of risk. They don't always authorize wire transfers directly, but they act on behalf of leadership with enough operational access to set up the conditions for a larger compromise. An attacker who can impersonate the right executive to the right assistant can move faster than almost any other route.

IT helpdesk and system administrators are targeted through voice authority attacks. A cloned voice asking urgently for a password reset, an MFA bypass, or emergency remote access is a direct path to credential compromise without touching a single endpoint.

HR and recruiting teams face a growing threat through deepfake interview scenarios, where attackers pose as candidates to gain system access during onboarding flows or extract sensitive personnel and payroll data.

New hires are particularly exposed because they haven't yet internalized what normal executive communication looks like in their organization. They don't yet know that the CFO never calls directly about invoices, or that IT never asks for credentials over the phone.

If these are the roles attackers target first, they should be the first roles tested in simulation.

Why Traditional Phishing Simulations Miss This Threat Pattern

Most phishing simulations measure one behavior: did the employee click the link or submit credentials? That's a useful metric for a single-channel email attack. It tells you almost nothing about how your team will respond when a convincing AI voice call follows up on a phishing email they already received.

Standalone vishing simulations have the same problem from the other direction. A cold call from an unknown number claiming to be the CFO is easier to question than a call that arrives after the CFO's email said it would. Testing the call in isolation produces a different result than testing the two channels together.

Most security awareness platforms were designed before multi-channel AI attacks were feasible to run at scale. Their simulation architecture reflects how attackers operated five or six years ago: one message, one channel, one decision point. Most training programs are still built around those threats.

Running a phishing test and a vishing call in separate quarterly programs and calling it multi-channel training is not the same as simulating a coordinated hybrid attack. The attack's power comes from the relationship between the two channels. Testing them separately doesn't capture that dynamic.

Top 5 Best Tools for Simulating Deepfake and Hybrid Attacks to Train Employees

The platforms below were selected based on their ability to simulate voice-based and multi-channel AI attacks, with specific attention to whether they can generate an email and a live AI call as a single coordinated campaign — the pattern at the center of this article.

1. Brightside AI

Brightside is a Swiss security awareness platform that covers phishing, vishing, deepfake awareness, and hybrid attacks within one admin environment. Its Hybrid Attack mode is the most direct simulation equivalent of the attack pattern this article describes. Admins build the phishing email and the live AI vishing call from the same five-step template, so both channels share the same attack goal, caller persona, and context. There's no stitching together of separate tools or syncing campaign timing manually.

The live call is handled by an AI agent that adapts in real time based on what the target says, rather than playing back a recording or dropping a voicemail. Admins select social engineering tactics from a set that includes authority impersonation, fear and threat framing, curiosity hooks, pretexting, and reciprocity, with urgency level and conversational tone configured independently. For organizations that want to impersonate a specific executive, a short audio upload is enough to create a cloned voice for use across targeted scenarios.

Before any simulation reaches employees, admins can test the full call in-browser, hearing exactly how the AI sounds and responds in real time. Simulation difficulty aligns to the NIST Phish Scale, follow-up training triggers automatically on failure, and a cooling period prevents the same domain being used against the same employee within three months. Integrations with Google Workspace, Microsoft Active Directory, Okta, and Vanta handle employee provisioning automatically.

Best for: Organizations that want phishing, vishing, and native hybrid simulation in one place, with live adaptive AI calls and NIST-aligned difficulty calibration.

2. Jericho Security

Jericho uses agentic AI to run voice and video deepfake simulations with live adaptive conversations and voice cloning capability. Its credentials include a U.S. Department of Defense deployment and a $15 million Series A in 2025, and it simulates attacks across email, SMS, and deepfake vectors. The deepfake training covers vishing calls, face-swap video impersonation, and synthetic identity scenarios, with a focus on building verification protocols alongside attack recognition. Email and call simulations are not presented as a unified hybrid campaign workflow, so coordinating both channels requires separate configuration. It remains one of the strongest options for organizations that prioritize maximum realism on standalone voice and video deepfake scenarios.

Best for: Enterprise and government organizations with high-realism requirements for vishing and deepfake video simulation.

3. Hoxhunt

Hoxhunt delivers phishing simulation alongside a deepfake scenario that starts with an email and routes the target to a fake browser-based meeting page styled to look like Teams, Meet, or Zoom. A pre-scripted avatar using a cloned voice delivers an urgent message from an impersonated executive. The avatar follows a fixed script rather than adapting dynamically to the target, and the simulation ends when the target clicks a link inside the fake meeting. Micro-training triggers immediately after. Hoxhunt's strongest evidence is behavioral: fail rates drop by 5.5x over 12 months of continuous simulation even as difficulty increases. For organizations whose primary threat surface runs through collaboration platforms rather than voice calls, it's a well-validated option.

Best for: Teams focused on long-term behavioral improvement where fake meeting and collaboration platform impersonation are the main concern.

4. Arsen

Arsen offers deepfake awareness through vishing simulations and workshop-based demonstrations, combined with spear-phishing simulations and dark web monitoring. The platform focuses on exposing employees to AI-generated voice and face-swap impersonation in a way that builds recognition and response habits. Dark web monitoring integration is a confirmed feature that adds meaningful context to training by surfacing what employee data is already exposed. The specific structure of multi-step coordinated attack flows is not detailed in Arsen's public documentation.

Best for: Organizations that want deepfake awareness training paired with dark web monitoring, particularly in French-speaking markets.

5. Adaptive Security

Adaptive Security runs simulations across email, phone, and SMS using conversational AI agents that adapt in real time, making it one of the few platforms that covers live voice-based attack simulation and email in the same product. Its defining feature is OSINT-driven personalization: the platform pulls public data including executive interviews, press releases, LinkedIn profiles, regulatory filings, and employee social media to build scenarios that reflect what a real attacker would actually construct for a specific organization. For executive-level awareness, it offers a CEO Deepfake Demo — a hyper-realistic video deepfake of the organization's own leadership, designed to make the threat concrete for board and C-suite audiences. Multi-channel delivery across email and phone is available, though Adaptive does not present a single unified template builder for coordinating both channels as one hybrid campaign.

Best for: Organizations that prioritize OSINT-driven personalization and executive-level deepfake awareness across email, voice, and SMS.

Platform

Email + Call in One Workflow

Live AI Voice

Voice Cloning

Deepfake Video

NIST-Aligned Difficulty

Brightside AI

Yes

Yes, live adaptive AI

Yes

Yes

Yes

Jericho

No unified workflow

Yes

Yes

Yes

No

Hoxhunt

Email + fake meeting page

Pre-scripted avatar

Yes

Yes

No

Arsen

Not publicly documented

Yes, vishing simulations

Not confirmed

Yes, workshops

No

Adaptive Security

Multi-channel, not unified

Yes, conversational AI agents

Not confirmed

Yes, CEO deepfake demo

No

Try our vishing simulator

Experience the most advanced voice phishing simulator built for security teams. Create scenarios, test voice cloning, and explore automation features.

How to Test Your Team Before Real Attackers Do

Running a hybrid deepfake simulation well requires more than turning on a new campaign type. The setup decisions determine the quality of the data you get back.

Start with your highest-risk cohort. Finance, executive assistants, and IT helpdesk are the natural starting point. They carry the most financial and access risk per failed simulation, and their baseline gives you the most actionable data.

Choose a scenario anchored in a real business process. The most effective pretexts map directly to how your organization actually works: an invoice approval, a vendor payment change, an urgent IT access request, or an executive authorization that bypasses the usual workflow. Generic scenarios don't produce useful data.

Connect the email and call explicitly. The email should name the caller, reference a specific request, and tell the target to expect a call. The call should open by referencing the email. Without that connection, you're running a phishing test and a vishing test in sequence, not simulating the coordination that makes hybrid attacks work.

Use realistic pressure. Real attackers sound like someone who already has the authority to make the request. An urgency level and tone that fits normal executive communication in your organization will produce a more useful result than an obviously aggressive simulation employees recognize immediately as a test.

Stagger the rollout. If a large group receives the simulation simultaneously, someone will warn their colleagues. Staggered delivery maintains the element of surprise and produces more accurate baseline data.

Trigger training immediately after failure. Point-of-error training, delivered within minutes of a failed simulation, reduces susceptibility significantly compared to remediation scheduled days or weeks later. The lesson lands when the experience is still fresh.

Measure more than fail rate. Track answer rate, call duration, time from email delivery to action, and reporting rate. These metrics reveal exactly where in the attack chain employees are most vulnerable, which is far more useful than a single pass or fail score.

Run it more than once. Training effects decay. Research shows they begin to fade after four months and can largely disappear after six without reinforcement. High-risk teams should run hybrid simulations quarterly. Broader populations benefit from at least semi-annual exposure.

Questions Security Leaders Ask About Hybrid Deepfake Attacks

What is a hybrid deepfake attack?
A hybrid deepfake attack is a coordinated social engineering attack that combines a phishing email with a follow-up AI-generated voice or video call. The email establishes context and primes the target to expect contact. The call uses a synthetic voice or deepfake identity to impersonate a trusted person and push the target to act.

How is a hybrid deepfake attack different from regular phishing?
Phishing relies on a single message being convincing enough to prompt action. A hybrid attack uses the email to lower resistance to the call, and the call to overcome any remaining skepticism from the email. Each channel validates the other, which is what makes the combination more effective than either alone.

Can employees learn to detect a deepfake voice on a live call?
Not reliably, especially under real-time social pressure. Studies show humans cannot consistently distinguish AI-generated speech from real speech under normal conditions, and performance gets worse when authority and urgency enter the interaction. The more durable defense is building verification habits: pause, call back through a known number, and report anything that applies unusual pressure to act immediately.

What makes a hybrid deepfake simulation realistic enough to change behavior?
The email and call must reference each other with consistent details. The AI must conduct a live, adaptive conversation rather than follow a static script. The voice and persona should match someone the target would reasonably expect to hear from. And the pressure applied should mirror real executive behavior in that organization, not an exaggerated test scenario.

Which employees should be tested first?
Finance and AP teams, executive assistants, IT helpdesk staff, HR teams, and new hires carry the highest risk based on role-level access and typical workflow exposure. Start there, establish a baseline, then expand to broader populations.

How often should hybrid simulations run?
Quarterly for high-risk roles is a strong baseline. For broader teams, semi-annual simulation with continuous phishing training in between maintains resistance without overloading employees. Annual compliance exercises alone are not enough to sustain behavioral change.

See What a Real Hybrid Attack Simulation Looks Like

Most teams don't realize how quickly a hybrid simulation can be configured until they walk through the workflow. In Brightside, admins set the attack goal, build the caller persona, select social engineering tactics and urgency level, choose or clone a voice, and generate the phishing email from one template builder. The AI call runs live, adapts to what the target says in real time, and references the email by design.

When an employee fails, follow-up training triggers automatically. The vishing dashboard tracks answer rate, fail rate trend, and median call duration, showing where in the call employees are most susceptible rather than just whether they passed or failed.