How to Use XPath Extraction
Navigate the full document tree with XPath expressions. More powerful than CSS selectors for complex conditions, attribute access, and partial text matching.
What is it?
XPath (XML Path Language) is a query language for navigating HTML and XML documents. Unlike CSS selectors, XPath can:
- Match elements by their text content (e.g., find the
<td>that contains "Total Price") - Traverse upward in the DOM tree (find the parent of a matching child)
- Match elements by partial attribute values using
contains() - Access attribute values directly (e.g.,
href,data-id,aria-label)
If CSS selectors can't target the exact node you need, XPath almost certainly can.
When to use it
- The element has no class or ID - you have to find it by its content or position
- You need to extract an attribute value rather than text (e.g.,
href,src,data-price) - You need to find an element relative to another (e.g., the
<td>next to the one that says "Price:") - The site's markup uses generic tags with no useful class names
How to configure it
Pick XPath as your extraction method, then map field names to XPath 1.0 expressions:
{
"method": "xpath",
"fields": {
"price": "//span[contains(@class,'price')]/text()",
"rating": "//div[@data-testid='rating']/@aria-label"
}
}
/text()- selects the text node of the element/@attribute- selects an attribute value- Results with multiple matches return the first match as a string
Finding XPath expressions
In Chrome DevTools: right-click the element → Inspect → right-click the node → Copy → Copy XPath.
Example 1 - Extract a price from an unlabelled span
Goal: Track a ticket price on a site that uses no meaningful class names.
Page HTML:
<div class="event-details">
<span>Venue:</span><span>Madison Square Garden</span>
<span>Date:</span><span>June 14, 2026</span>
<span>Price:</span><span>$85.00</span>
</div>
There's no class on the price span, but it always follows the span with "Price:" as text.
Configuration:
{
"method": "xpath",
"fields": {
"price": "//span[text()='Price:']/following-sibling::span[1]/text()",
"venue": "//span[text()='Venue:']/following-sibling::span[1]/text()"
}
}
Output:
{
"price": "$85.00",
"venue": "Madison Square Garden"
}
Example 2 - Extract an attribute value (href)
Goal: Track the download URL for the latest release on a changelog page.
Page HTML:
<section class="latest-release">
<h2>v3.0.0</h2>
<a class="download-btn" href="https://releases.example.com/v3.0.0/app.zip">
Download
</a>
</section>
Configuration:
{
"method": "xpath",
"fields": {
"version": "//section[contains(@class,'latest-release')]/h2/text()",
"download_url": "//section[contains(@class,'latest-release')]//a[@class='download-btn']/@href"
}
}
Output:
{
"version": "v3.0.0",
"download_url": "https://releases.example.com/v3.0.0/app.zip"
}
Example 3 - Extract an aria-label for accessibility-driven data
Goal: Read a star rating from an element that stores it only in an aria-label.
Page HTML:
<div class="rating" data-testid="product-rating" aria-label="4.5 out of 5 stars">
★★★★½
</div>
Configuration:
{
"method": "xpath",
"fields": {
"rating": "//div[@data-testid='product-rating']/@aria-label"
}
}
Output:
{
"rating": "4.5 out of 5 stars"
}
Example 4 - Grab the Nth row of a table
Goal: Track the price of the second-cheapest item in a pricing table.
Page HTML:
<table id="pricing-table">
<tr><td>Basic</td><td>$9/mo</td></tr>
<tr><td>Pro</td><td>$29/mo</td></tr>
<tr><td>Enterprise</td><td>$99/mo</td></tr>
</table>
Configuration:
{
"method": "xpath",
"fields": {
"basic_price": "//table[@id='pricing-table']/tr[1]/td[2]/text()",
"pro_price": "//table[@id='pricing-table']/tr[2]/td[2]/text()"
}
}
Output:
{
"basic_price": "$9/mo",
"pro_price": "$29/mo"
}
XPath quick reference
| Expression | What it does |
|---|---|
//div |
Any <div> anywhere in the document |
//div[@class='price'] |
<div> with class exactly equal to "price" |
//div[contains(@class,'price')] |
<div> whose class contains "price" |
//span[text()='Total:'] |
<span> whose text is exactly "Total:" |
/text() |
The text node of the selected element |
/@href |
The href attribute of the selected element |
/following-sibling::td[1] |
The first <td> sibling after the matched element |
/parent::div |
The parent <div> of the matched element |
[1], [2] |
Position predicates (1-indexed) |
Tips
contains()over exact match -contains(@class,'price')is more robust than@class='price'because the exact match breaks if the element has multiple classes./text()vs.-/text()selects only the direct text node;.selects all text including nested tags. Use/text()to avoid picking up text from child elements.- Test in the browser console - open DevTools, go to Console, and run
$x('your expression here')to preview what XPath returns before saving the monitor.
More guides