3D Phishing in 2026, A Comprehensive Engineering Guide
A practitioner level engineering guide to 3D phishing in 2026. Multimodal attack anatomy, attacker economics, orchestration sketches, detection rules across email, telephony, and video, hardened email auth configuration, identity controls, and a layered defense plan.

3D Phishing in 2026, A Comprehensive Engineering Guide
3D phishing is what happens when an adversary refuses to choose between email, voice, and video, and instead runs all three on a single target inside the same hour. The target gets an email that looks legitimate, a phone call that confirms the email, and a video meeting that "settles" the request. By the time the target says yes, three independent channels have all corroborated the same false story.
The attack is not new in concept. What is new is that one person can run it. The combined cost of voice cloning, real time inference, and live deepfake video has dropped from "nation state budget" to "weekend project". This guide is the comprehensive engineering view: how the attack is orchestrated, what it costs, what to detect, how to detect it, and what to configure across email, identity, telephony, and video conferencing platforms to break the chain.
This post is long. Skip to whatever section is relevant to your role:
- TL;DR for the executive summary.
- Attack anatomy for a stage by stage walkthrough.
- Orchestration sketch for what the attacker code looks like.
- Detection for SIEM rules across email, voice, and video.
- Email side for a hardened DMARC, DKIM, SPF, BIMI configuration.
- Voice side for SIP hardening and callback policy.
- Video side for liveness checks and meeting hygiene.
- Identity side for FIDO2, conditional access, and step up authentication.
- Defense plan for the layered control set we recommend.
- Threat model and metrics for board level reporting.
TL;DR
- 3D phishing is an orchestrated multimodal attack. Email, voice, and video are coordinated against the same target on the same timeline, usually inside a thirty minute window.
- The cost has dropped from "nation state" to "small team with a laptop". Open weights, real time inference, and cheap voice cloning are the key changes.
- Detection requires correlating signals across email, telephony, and video conference logs. Single channel detection misses the attack.
- Configuration that helps today: DMARC at p=reject with strict alignment, BIMI with verified marks, FIDO2 MFA, signed callback channels, hardened video meeting joins, and an enforced out of band confirmation rule for any high value request.
- Most "AI deepfake detection" tooling is unsafe as a primary control. Pilot before procuring. Use as a layer, not a gate.
- The single highest leverage human control remains a written, no exception, callback policy on any high value request.
Why 3D phishing exists now
3D phishing is the natural endpoint of three independent trends that converged inside the last twenty four months.
Voice cloning crossed the consumer threshold
In 2024, cloning a voice with conversational quality required serious compute and a lot of clean source audio. By 2026, several open weight models produce conversational quality clones from less than thirty seconds of source. They run on consumer hardware. Public APIs from commercial providers offer the same capability at a few cents per minute. Any senior leader who has appeared on a podcast, given a conference talk, or featured in a quarterly earnings call has more than enough public audio to be cloned cleanly.
Real time deepfake video left the lab
The same is true for video. Real time face swap deepfakes that struggled with extreme angles in 2024 now hold up across a four to six minute meeting under typical lighting and webcam quality. They are not perfect. They fail at extreme head turns, hand to face occlusion, and bright unfamiliar lighting. But they are good enough to fool a busy executive on a short Zoom call where the implicit assumption is that the person on screen is who they appear to be.
Email at scale is now AI authored, in good prose
The grammar tells in old phishing emails are gone. The "Sent from my iPhone" misspellings are gone. AI generated email content matches the voice of the impersonated organisation, references the right active project names harvested from public data, and arrives at scale. This is not deepfake email. It is just decent writing at industrial scale.
The combination of the three is 3D phishing. None of them on their own is a step change. Together, they are.
Attack anatomy
A 3D phishing engagement we have replayed end to end in red team exercises follows this pattern.
Stage 0, reconnaissance (days to weeks)
The attacker collects public artefacts and assembles a target dossier. This is not different from any spear phishing campaign. What is different is that the dossier feeds a multimodal kill chain, so it captures channel specific data.
For email: the target's correct signature format, the typical sender names and addresses they correspond with, the list of active vendor relationships, and the language tone the impersonated leader uses internally.
For voice: a clean audio sample of the impersonated leader, harvested from public sources. A sample list of known internal numbers to spoof. A sense of when the target is reachable by phone.
For video: a face sample of the impersonated leader. Calendar context inferred from public data ("the CFO is at the Singapore conference this week"). Typical meeting platforms used internally.
Stage 1, the email seed (T = 0)
The campaign begins with an email that creates context for the next two channels. The email itself is rarely the attack payload. Its job is to set the scene.
A typical seed email might say: "We need to confirm the wire to Acme Vendors before EOD. Mr. Mehra will call you in five minutes to walk you through it."
This email carries three pieces of information:
- The pretext (a wire to Acme Vendors).
- The expected next channel (a phone call).
- The expected timeline (five minutes).
The target's brain will, when the phone rings, fill in the rest. The attacker has shaped the cognitive frame the target uses to interpret the next event.
Stage 2, the voice payload (T + 5 min)
A cloned voice agent calls the target on a recognised number. The number is spoofed via SIP origination, often using a service that performs minimal abuse checks. The voice mirrors the email content and adds urgency.
The voice does not need to be a perfect clone. It needs to be a believable clone, in low quality phone audio, for the four to six minutes of the call. Phone audio strips a lot of the cues that would expose a clone in lossless audio.
Three patterns we see in real engagements:
- The voice complains about the connection or background noise as a pre emptive explanation for any audio artefacts.
- The voice sets a hard deadline ("I am about to walk into a meeting") to compress the target's decision window.
- The voice volunteers a piece of correct internal context (a project name, a colleague's name, an active vendor) to establish authenticity.
Stage 3, the video corroboration (T + 15 min)
A scheduled or impromptu video meeting features a deepfake video stream of the impersonated leader. The meeting is short, by design. The lighting is poor, by design. The camera angle is awkward, by design. None of those are accidents.
Three patterns we see:
- The meeting is short ("five minutes, just to confirm we are aligned").
- The deepfake leader appears briefly, then drops off citing connectivity ("can you finish this with my assistant, I have to jump").
- The video confirms the email and voice, without introducing new information that might trip a verification.
Stage 4, the action
The target authorises the wire, releases the credential, or installs the remote access tool. Channel agreement has done the social engineering. Three independent sources (email, voice, video) all said the same thing. The target's cognitive model says this is real.
Stage 5, monetisation and cleanup
The attacker monetises (wire to attacker controlled account, credential exfiltrated to attacker controlled infrastructure, tool installed for persistence). The infrastructure rolls. The next campaign reuses the same playbook against a different target.
Orchestration sketch
What follows is a defensive simplification of orchestration code we have observed in real engagements. We are publishing the shape, not a working exploit kit. The point is to make detection logic concrete and to show that the attacker code is not magic. It is dumb sequencing on a clock.
from dataclasses import dataclass
@dataclass
class Target:
name: str
email: str
phone: str
meeting_handle: str
voice_sample_url: str
face_sample_url: str
@dataclass
class Persona:
name: str
title: str
spoofed_email: str
voice_model_id: str
video_avatar_id: str
phone_number_pool: list
def run_campaign(persona, target):
seed = render_template("wire_confirm_v3.txt", persona=persona, target=target,
active_vendors=enrich_vendor_context(target))
send_email(persona.spoofed_email, target.email,
"Wire confirmation required, EOD", seed)
log_event("email_sent", target.email)
sleep_until(t_plus_minutes=5)
call_id = place_voice_call(
from_number=pick_credible_number(persona.phone_number_pool, target),
to_number=target.phone,
voice_model_id=persona.voice_model_id,
prompt=render_prompt("wire_followup_v3.txt", persona=persona, target=target),
agent_loop=real_time_inference_loop,
)
log_event("voice_call_placed", call_id)
sleep_until(t_plus_minutes=15)
meeting_id = schedule_video_meeting(
host_persona=persona,
target=target.meeting_handle,
avatar_id=persona.video_avatar_id,
duration_min=4,
topic="Quick alignment on Acme wire",
)
log_event("video_invite_sent", meeting_id)
monitor_for_action(target, deadline_minutes=60)Three things to notice in this sketch.
First, the orchestration is dumb. It is just a sequence of channel actions on a clock, parameterised by the persona and target.
Second, every channel has its own provider. The email gateway, the SIP origination, the inference service, the video meeting platform. None of them are inherently malicious. All of them are abusable.
Third, every step leaves a log. Email gateways log sender, recipient, time. Telephony platforms log call origin, destination, duration. Video meeting platforms log invite sender, attendees, join times. Identity platforms log session activity. The attack is invisible only if you do not correlate. That is where defenders win.
Detection: stop looking at one channel at a time
The reason 3D phishing succeeds against mature SOCs is that mature SOCs run separate detection pipelines for email, voice, and video. The attack is invisible inside any single pipeline. The fix is correlation.
Sigma style pseudocode (correlation required)
Sigma cannot truly correlate across multiple sources on its own. The right way to express this rule is as pseudocode that signals to a SIEM or XDR backend that cross source correlation is required. The shape:
title: Possible 3D Phishing via Cross-Channel Novelty (Correlation Required)
id: 1d77a0f3-2c45-4d59-8a16-pivot-3dphish
status: experimental
description: >
Correlates inbound email, inbound call, and meeting invite from previously
unseen external entities targeting the same user within 30 minutes.
logsource:
product: multi_source
detection:
selection_email:
event_type: email.inbound
external: true
sender_first_seen_within_30d: true
selection_call:
event_type: telephony.inbound
caller_first_seen_within_30d: true
selection_meeting:
event_type: meeting.invite
organiser_first_seen_within_30d: true
timeframe: 30m
condition: correlation(target_user, selection_email, selection_call, selection_meeting)
fields:
- target_user
- sender
- caller
- organiser
level: mediumTune the time window per role. Finance teams often need a 60 minute window because legitimate vendor coordination spans longer. The point of writing this in Sigma shape is portability across SIEM vendors, with the explicit understanding that the correlation operator is implemented inside the SIEM, not by Sigma itself.
KQL for Microsoft 365 stacks
If your organisation runs on Microsoft 365 with Sentinel, this version normalises the target user, expands multi recipient emails, avoids brittle triple joins, and uses time bucketing for the correlation:
let timeframe = 1h;
let lookback = 30d;
let emails =
EmailEvents
| where Timestamp > ago(timeframe)
| where DeliveryAction == "Delivered"
| where SenderFromAddress !endswith "@yourcorp.com"
| mv-expand Recipient = split(ToRecipientsEmailAddresses, ";")
| extend TargetUser = tolower(trim(" ", Recipient))
| join kind=leftanti (
EmailEvents
| where Timestamp between (ago(lookback) .. ago(timeframe))
| distinct SenderFromAddress
) on SenderFromAddress
| project TargetUser, EmailTime=Timestamp, EmailSender=SenderFromAddress;
let calls =
TelephonyEvents
| where Timestamp > ago(timeframe)
| where Direction == "Inbound"
| extend TargetUser = tolower(CalleeUserPrincipalName)
| join kind=leftanti (
TelephonyEvents
| where Timestamp between (ago(lookback) .. ago(timeframe))
| distinct CallerNumber
) on CallerNumber
| project TargetUser, CallTime=Timestamp, Caller=CallerNumber;
let meetings =
OfficeActivity
| where TimeGenerated > ago(timeframe)
| where Operation in ("MeetingCreated","MeetingScheduled")
| extend TargetUser = tolower(UserId)
| join kind=leftanti (
OfficeActivity
| where TimeGenerated between (ago(lookback) .. ago(timeframe))
| distinct UserId
) on UserId
| project TargetUser, MeetingTime=TimeGenerated, Organiser=UserId;
union
(emails | extend Type="email", Time=EmailTime),
(calls | extend Type="call", Time=CallTime),
(meetings | extend Type="meeting", Time=MeetingTime)
| summarize
EmailSenders = make_set_if(EmailSender, Type=="email"),
Callers = make_set_if(Caller, Type=="call"),
Organisers = make_set_if(Organiser, Type=="meeting"),
EventTypes = make_set(Type)
by TargetUser, bin(Time, 30m)
| where array_length(EventTypes) == 3Why this version is more robust: no triple inner join, time buckets handle the correlation, and mv-expand covers multi recipient emails properly. Field names will vary per environment. Tune the time bucket and the lookback for your organisation.
Splunk SPL equivalent (stats over join)
The same idea in Splunk, written without join because triple joins do not scale. Instead, use a union of three searches and aggregate by user and time bucket:
(
search index=email_events sourcetype=email_inbound earliest=-1h external=true
| eval TargetUser=lower(recipient)
| eval Type="email"
| table _time TargetUser sender Type
)
OR
(
search index=telephony sourcetype=call_log earliest=-1h direction=inbound
| eval TargetUser=lower(callee_user)
| eval Type="call"
| table _time TargetUser caller_number Type
)
OR
(
search index=o365 sourcetype=office_activity earliest=-1h
Operation IN ("MeetingCreated","MeetingScheduled")
| eval TargetUser=lower(user)
| eval Type="meeting"
| table _time TargetUser user Type
)
| bin span=30m _time
| stats
values(sender) as email_senders
values(caller_number) as callers
values(user) as organisers
values(Type) as types
by TargetUser _time
| where mvcount(types)=3Why this is better: no join, which is a major performance win on real Splunk deployments. The correlation is the shared TargetUser and the shared 30 minute time bucket. Easier to tune, easier to extend.
Behavioural signals beyond simple correlation
Time correlation alone is too noisy. Strengthen the signal with:
- Recipient role weighting. A finance, payroll, executive support, or vendor management role triggers higher weight. An engineer in product triggers lower weight (still relevant but less urgent).
- Action correlation. Within the same window, was there a wire request submitted, a credential reset, an MFA challenge, a vendor account change, a remote tool installation? An action signal pushes the alert to high priority.
- Geographic anomaly. Inbound caller country mismatched against expected counterparty country. Inbound email originated from infrastructure inconsistent with the claimed sender's known infrastructure (DMARC alignment failures count).
- Time of day anomaly. Off hours combinations are more suspicious than business hours.
The goal of the rule is not to catch every 3D phishing attempt. It is to catch the ones that matter, which means weighting toward action correlation and role.
Email side, hardened authentication
Email is the first channel an attacker uses to seed the story. If your domain cannot be cleanly spoofed, the rest of the kill chain becomes harder. A working DMARC, DKIM, SPF, BIMI configuration looks like this:
; SPF
example.in. IN TXT "v=spf1 include:_spf.google.com include:mailgun.org -all"
; DKIM (selector google as example)
google._domainkey.example.in. IN TXT "v=DKIM1; k=rsa; p=MIGfMA0GCSqGSIb3..."
; Additional DKIM selector (rotation friendly)
mailgun._domainkey.example.in. IN TXT "v=DKIM1; k=rsa; p=MIGfMA0GCSqGSIb3..."
; DMARC at reject with strict alignment
_dmarc.example.in. IN TXT "v=DMARC1; p=reject; sp=reject; rua=mailto:[email protected]; ruf=mailto:[email protected]; fo=1; adkim=s; aspf=s; pct=100"
; BIMI with Verified Mark Certificate (VMC)
default._bimi.example.in. IN TXT "v=BIMI1; l=https://example.in/static/logo.svg; a=https://example.in/static/vmc.pem"
; MTA-STS for transport security
_mta-sts.example.in. IN TXT "v=STSv1; id=20260101000000Z"
; TLS-RPT for transport reporting
_smtp._tls.example.in. IN TXT "v=TLSRPTv1; rua=mailto:[email protected]"Notes on this config:
p=rejectis non negotiable.p=quarantineis a halfway state where attackers can still land in spam folders, which is an attack vector for finance roles who scan spam looking for "missed" supplier emails.adkim=sandaspf=senforce strict alignment. Most internal forwarding setups handle this fine. Test before flipping.sp=rejectsets the same policy for subdomains. Many organisations setp=rejecton the parent domain and forget about subdomains.- DKIM rotation. Rotate keys at least every six months. Configure two selectors and rotate one at a time so deliverability does not break during the swap.
- BIMI with VMC raises the visual bar for attackers in supported clients. Not authentication on its own, but a useful trust signal that reduces the success rate of impersonation in inbox views that show sender logos.
- MTA-STS and TLS-RPT harden the transport layer and give you reporting on misrouted or downgraded mail.
Lookalike domains
DMARC protects your domain. It does not protect against domains that look like yours. Buy obvious lookalikes (Cyrillic homoglyphs, common typos, alternate TLDs) and park them with p=reject policies. Subscribe to a domain monitoring service (or write a small daily job that checks newly registered domains against a similarity threshold to your legitimate domain).
Inbound impersonation detection
Most modern mail security stacks include impersonation detection that flags external mail with display names matching internal executives. Turn this on. Combine with banner injection on external mail, especially mail mentioning urgency, money, credentials, or compliance.
Voice side, signed caller identity and a hardened policy
Caller ID is not authentication. STIR/SHAKEN raises the bar in some markets but is patchy in India and absent in many regions. The realistic posture is to treat all unsolicited callers as unauthenticated, and require a callback on a known good number for any high value request.
In practice this means a written, enforced policy:
Any phone request involving money, credentials, vendor changes, or system access pauses the call. The receiver hangs up, retrieves the requester from the company directory, and calls back. The receiver is empowered to refuse to act if the callback fails. No exceptions, including for the CEO.
Reinforce the rule with:
- An internal directory of verified numbers maintained by IT, not by individual assistants. Attackers commonly social engineer the assistant first to plant a fake number.
- Dedicated finance and exec lines with restricted inbound rules. Where possible, route external inbound calls to these lines through a verification system or a human receptionist.
- Telephony log feed to SIEM. Without it, the correlation patterns above are impossible. Most modern PBX and VoIP platforms expose call detail records via API. Pipe them in.
A shared safe word
Pick a phrase that lives in finance, executive, and engineering team members' heads, and nowhere else. Not on Slack, not in a wiki, not in an email. If a caller cannot produce the safe word on demand, the call ends. This sounds like spy fiction. It works because the attacker has no way to acquire it.
Video side, liveness checks and meeting hygiene
Live deepfake video is good but not perfect. A short liveness check defeats most current generation real time deepfakes.
Liveness checks that work today
-
Head turn to ninety degrees. Real faces handle this cleanly. Most real time face swap tools warp at extreme angles because the training data does not include profile views captured under similar lighting.
-
Hand to face occlusion. Ask the caller to hold their hand near their face, fingers across the cheek or jaw. Hands and faces interacting in real time still trip many real time avatars.
-
Spoken phrase test. Ask the caller to read a phrase you choose on the spot, ideally one that includes uncommon words. Real time voice clones handle this, but timing artefacts often appear, and the combined audio plus video synchrony is harder to keep tight.
-
Object presentation. Ask the caller to hold up a specific object that you can confirm is in their normal environment (a book on a shelf they have shown before, an item on their desk). Deepfakes do not have access to the impersonated person's actual environment.
-
Out of band confirmation. A short text on a known good number, or a Slack DM on a verified internal account, asking "are you on a call with X right now". This is the strongest of the five.
Pseudocode for a defender's checklist
def liveness_check(participant, request):
if not participant.is_known_internal:
return require("calendar_invite_from_known_account_in_last_24h")
if request.is_high_value:
return require_all([
"camera_on",
"head_turn_test_passed",
"hand_to_face_test_passed",
"spoken_phrase_test_passed",
"out_of_band_confirmation_received",
])
return Decision.ALLOWThe point is not to ship this as code. The point is that liveness has to be explicit, not implicit. "I can see them" is no longer the same thing as "I have authenticated them".
Meeting platform hygiene
- Restrict who can host meetings on internal accounts. Lock down host privileges so only verified internal users can organise meetings under your tenancy.
- Require waiting rooms for all meetings. The host explicitly admits each participant.
- Restrict screen sharing to host approval. A common attack pattern uses screen sharing of a fake "internal portal" to capture credentials.
- Audit meeting invites that were sent from an internal account to a single external recipient with no calendar context.
Identity side, FIDO2 and conditional access
Even if every preceding layer fails, identity controls can break the chain at the last mile.
FIDO2 / WebAuthn for high value accounts
Replace push and SMS MFA with FIDO2 / WebAuthn for any account that authorises money, accesses production systems, or holds executive privileges. Push and SMS factors are vulnerable to AiTM proxies. FIDO2 cryptographically binds the second factor to the legitimate origin, so a phishing site cannot relay the challenge.
Conditional access
- Device hygiene. Require managed devices for high value applications. A login to the finance approval app from an unmanaged device should be blocked, not flagged.
- Geographic and time fencing. Restrict logins to expected geographies and hours for sensitive roles. Build exception flows.
- Step up authentication on sensitive actions. Even within an authenticated session, require fresh authentication for any payment over a defined threshold or any vendor account change.
Short lived sessions
A stolen session that expires in fifteen minutes is much less dangerous than one that expires in twelve hours. Tune session lifetimes for sensitive applications down to the minimum your operations can tolerate.
A defense plan, layered and prioritised
Here is what we recommend after these engagements, in order of impact per dollar.
Layer one, human policy
- Two minute callback rule on any high value phone request.
- Shared safe word among the finance, executive, and engineering teams.
- No urgency rule that requires a paper trail for any phone or video request.
- Out of band confirmation rule on any high value action across any channel.
Layer two, identity hardening
- FIDO2 / WebAuthn MFA for accounts with payment authority, production access, or executive privileges.
- Conditional access on device, geography, and time.
- Short session lifetimes for sensitive applications.
- Step up authentication on sensitive actions.
Layer three, email hardening
- DMARC
p=rejectwith strict alignment on every owned domain and subdomain. - DKIM with rotation and multiple selectors.
- BIMI with a Verified Mark Certificate.
- MTA-STS and TLS-RPT for transport hardening.
- Lookalike domain monitoring and parking.
- Inbound impersonation detection and external mail banner injection.
Layer four, telephony hygiene
- Internal verified number directory maintained by IT.
- Dedicated finance and exec lines with restricted inbound rules.
- Telephony log feed to SIEM for correlation.
Layer five, video meeting hygiene
- Liveness checklist embedded in meeting invite templates for high value calls.
- Restricted host privileges and waiting rooms.
- Screen share approval and audit trails.
Layer six, detection and correlation
- Identity centric correlation feed across email, telephony, video, and identity events.
- Sigma or KQL or SPL rules for time and action correlation.
- Behavioural weighting by role, action, geography, and time.
Layer seven, training and incident response
- Role specific simulations against finance, executive support, and engineering on a lifecycle.
- Updated awareness content that reflects current AI generation capability.
- Pre written incident playbook for vishing and 3D phishing incidents.
- A relationship with your bank's fraud team before you need it.
- A no blame post mortem culture that surfaces incidents in real time, not days later.
Threat model and metrics
For board level reporting and program governance, we recommend a small set of metrics that track the program rather than the noise.
Metrics that move the program
- Percentage of accounts with FIDO2 enforced (targeted: 100% for accounts with payment authority within twelve months).
- Percentage of payment authorisations with verified out of band confirmation (targeted: 100%).
- Percentage of high value meetings with documented liveness check (targeted: 100% for any meeting authorising money or access).
- Time from phishing simulation click to detection (targeted: under fifteen minutes).
- Time from real incident report to credential or session revocation (targeted: under thirty minutes).
Metrics to ignore
- Generic phishing click rate without role weighting. Tells you almost nothing.
- Generic awareness training completion percentage. Compliance theatre.
- Number of phishing emails blocked. Volume metric, not effectiveness.
Industry data and trend
Indian CERT (CERT-In) reported a continued rise in multimodal phishing through 2025 and into early 2026, with a notable shift from single channel email phishing toward coordinated email plus voice plus video campaigns against finance functions. Numbers from external industry reports tell the same story, with at least three Indian financial services losses in 2025 attributable to deepfake assisted authorisation.
Treat the trend as load bearing. Plan for a year in which 3D phishing is the standard attack against any role with payment authority.
Closing checklist, what to ship this quarter
If you take five things from this guide into your environment this quarter:
- Move all owned domains and subdomains to DMARC
p=rejectwith strict alignment. Confirm with a third party tool. - Replace push and SMS MFA with FIDO2 / WebAuthn for any role that authorises money or production access.
- Write and circulate an out of band confirmation rule for finance, executive support, and engineering. Make it the default, not the exception. Test it with a simulation.
- Add a liveness checklist to the meeting invitation template for any high value call. Train the team to use it.
- Build a single identity centric correlation feed across email, telephony, video, and identity. Even if the only output is a slow alert, the correlation is your first real defense against this attack class.
If you want a 3D phishing simulation run end to end against your own team under a controlled engagement, we offer this as part of our adversary simulation programme. Request a briefing.
For the voice only version of this attack, see our AI voice phishing field guide. For the email only version, see our spear phishing 2026 guide.
Want this kind of analysis on your stack?
A 30-minute briefing with one of our practice leads. No sales pitch.


