XPath for Web Scraping — Practical Guide
XPath is one of the most powerful tools for web scraping because it can select elements by text content, navigate up and down the DOM tree, and handle complex page structures that CSS selectors cannot express.
XPath Tester
Test XPath expressions against XML data with real-time evaluation. Extract elements, filter by attributes, and navigate XML document structures.
XPath Reference
| Expression | Description |
|---|---|
/ | Root element |
//element | All matching elements anywhere |
./child | Direct child of context |
@attr | Attribute value |
[1] | First element (1-indexed) |
[last()] | Last element |
[position()<3] | First two elements |
[@attr='val'] | Filter by attribute |
[contains(., 'text')] | Contains text |
[starts-with(@id, 'x')] | Starts with |
text() | Text content |
node() | Any node |
count(//el) | Count elements |
sum(//el) | Sum of numeric values |
string-length(//el) | String length |
ancestor::el | Ancestor axis |
descendant::el | Descendant axis |
following-sibling::el | Following siblings |
parent::el | Parent axis |
el1 | el2 | Union of two node sets |
About XPath
XPath (XML Path Language) is a query language for selecting nodes from XML documents. It uses path expressions to navigate through elements, attributes, and text in an XML tree structure.
- Nodes — elements, attributes, text, comments, and the document itself
- Axes — define the direction of navigation (child, parent, ancestor, descendant, sibling)
- Predicates — filter nodes with conditions inside square brackets
- Functions — built-in string, number, and node functions (contains, count, sum, etc.)
XPath is used in XSLT, XQuery, web scraping (Selenium, Puppeteer), configuration parsing, and XML data extraction. This tool uses your browser's built-in XPath 1.0 engine — no data is sent over the network.
Common XPath patterns for scraping
The most useful XPath patterns for web scraping: //a[contains(@href, '/product/')] matches product links by URL pattern. //div[contains(@class, 'price')]/text() extracts price text. //table//tr[position()>1]/td scrapes table rows skipping the header. //img/@src gets all image URLs. //h2/following-sibling::p[1] gets the first paragraph after each heading. //*[contains(text(), 'Add to Cart')]/ancestor::div[@class] finds the container around an 'Add to Cart' button.
# Selenium (Python)
from selenium.webdriver.common.by import By
driver.find_elements(By.XPATH, "//div[@class='product']")
driver.find_element(By.XPATH, "//button[text()='Submit']")
# Puppeteer (JavaScript)
const elements = await page.$x("//a[contains(@href, '/item/')]")
const texts = await page.$x("//span[@class='price']/text()")
# Scrapy (Python)
response.xpath("//h1/text()").get()
response.xpath("//ul[@class='nav']//a/@href").getall()Handling dynamic and complex pages
Modern web pages often use dynamic class names (e.g., css-1a2b3c) that change between builds. XPath handles this with partial matching: //*[contains(@class, 'product')] matches any element whose class contains 'product'. For pages with multiple similar sections, use positional predicates: (//div[@class='card'])[3] selects the third card. For shadow DOM or iframe content, you need to switch context first in your scraping tool before applying XPath.
Testing XPath before scraping
Always test your XPath expressions before writing scraping code. You can use this tool by pasting the page's HTML source, or test directly in browser DevTools: open the Console and run document.evaluate("//your/xpath", document, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null). In Chrome DevTools Elements panel, press Ctrl+F and type your XPath expression to highlight matching elements on the page.
Frequently Asked Questions
How do I find the XPath of an element in Chrome DevTools?
Right-click the element on the page, select Inspect, then right-click the highlighted element in the Elements panel and choose Copy → Copy XPath (for an absolute path) or Copy → Copy full XPath. You can also press Ctrl+F in the Elements panel and type an XPath expression to search — Chrome highlights matching elements and shows the count. This is the fastest way to test and refine XPath queries.
How do I handle namespaces in XPath?
XML namespaces can break XPath queries because //element won't match <ns:element>. Solutions: (1) use local-name(): //*[local-name()='element'] ignores the namespace prefix, (2) in code, register a namespace resolver that maps prefixes to URIs, (3) strip namespaces from the XML before querying if you control the input. Most web scraping scenarios use HTML (not XML), so namespaces are rarely an issue.
What is the best XPath strategy for stable scraping?
For resilient scrapers: (1) prefer semantic attributes over generated ones — @id, @name, @role, data-* attributes are more stable than CSS class names, (2) use text content as anchors — //label[text()='Email']/following::input[1] survives layout changes, (3) avoid deep absolute paths like /html/body/div[3]/div[2] because they break when the page structure changes, (4) combine contains() with class fragments for partial matching, (5) test with multiple pages to ensure your XPath works across variations.
Related Inspect Tools
Diff Checker
Compare two texts and see differences highlighted
Cron Expression Parser
Parse cron schedules into plain English with next run times
Word & Character Counter
Count words, characters, sentences, and estimate reading time
Chmod Calculator
Calculate Unix file permissions with an interactive permission matrix