How to Use XPath Extraction

Navigate the full document tree with XPath expressions. More powerful than CSS selectors for complex conditions, attribute access, and partial text matching.

What is it?

XPath (XML Path Language) is a query language for navigating HTML and XML documents. Unlike CSS selectors, XPath can:

Match elements by their text content (e.g., find the <td> that contains "Total Price")
Traverse upward in the DOM tree (find the parent of a matching child)
Match elements by partial attribute values using contains()
Access attribute values directly (e.g., href, data-id, aria-label)

If CSS selectors can't target the exact node you need, XPath almost certainly can.

When to use it

The element has no class or ID - you have to find it by its content or position
You need to extract an attribute value rather than text (e.g., href, src, data-price)
You need to find an element relative to another (e.g., the <td> next to the one that says "Price:")
The site's markup uses generic tags with no useful class names

How to configure it

Pick XPath as your extraction method, then map field names to XPath 1.0 expressions:

{
  "method": "xpath",
  "fields": {
    "price": "//span[contains(@class,'price')]/text()",
    "rating": "//div[@data-testid='rating']/@aria-label"
  }
}

/text() - selects the text node of the element
/@attribute - selects an attribute value
Results with multiple matches return the first match as a string

Finding XPath expressions

In Chrome DevTools: right-click the element → Inspect → right-click the node → Copy → Copy XPath.

Example 1 - Extract a price from an unlabelled span

Goal: Track a ticket price on a site that uses no meaningful class names.

Page HTML:

<div class="event-details">
  <span>Venue:</span><span>Madison Square Garden</span>
  <span>Date:</span><span>June 14, 2026</span>
  <span>Price:</span><span>$85.00</span>
</div>

There's no class on the price span, but it always follows the span with "Price:" as text.

Configuration:

{
  "method": "xpath",
  "fields": {
    "price": "//span[text()='Price:']/following-sibling::span[1]/text()",
    "venue": "//span[text()='Venue:']/following-sibling::span[1]/text()"
  }
}

Output:

{
  "price": "$85.00",
  "venue": "Madison Square Garden"
}

Example 2 - Extract an attribute value (href)

Goal: Track the download URL for the latest release on a changelog page.

Page HTML:

<section class="latest-release">
  <h2>v3.0.0</h2>
  <a class="download-btn" href="https://releases.example.com/v3.0.0/app.zip">
    Download
  </a>
</section>

Configuration:

{
  "method": "xpath",
  "fields": {
    "version": "//section[contains(@class,'latest-release')]/h2/text()",
    "download_url": "//section[contains(@class,'latest-release')]//a[@class='download-btn']/@href"
  }
}

Output:

{
  "version": "v3.0.0",
  "download_url": "https://releases.example.com/v3.0.0/app.zip"
}

Example 3 - Extract an aria-label for accessibility-driven data

Goal: Read a star rating from an element that stores it only in an aria-label.

Page HTML:

<div class="rating" data-testid="product-rating" aria-label="4.5 out of 5 stars">
  ★★★★½
</div>

Configuration:

{
  "method": "xpath",
  "fields": {
    "rating": "//div[@data-testid='product-rating']/@aria-label"
  }
}

Output:

{
  "rating": "4.5 out of 5 stars"
}

Example 4 - Grab the Nth row of a table

Goal: Track the price of the second-cheapest item in a pricing table.

Page HTML:

<table id="pricing-table">
  <tr><td>Basic</td><td>$9/mo</td></tr>
  <tr><td>Pro</td><td>$29/mo</td></tr>
  <tr><td>Enterprise</td><td>$99/mo</td></tr>
</table>

Configuration:

{
  "method": "xpath",
  "fields": {
    "basic_price": "//table[@id='pricing-table']/tr[1]/td[2]/text()",
    "pro_price": "//table[@id='pricing-table']/tr[2]/td[2]/text()"
  }
}

Output:

{
  "basic_price": "$9/mo",
  "pro_price": "$29/mo"
}

XPath quick reference

Expression	What it does
`//div`	Any `<div>` anywhere in the document
`//div[@class='price']`	`<div>` with `class` exactly equal to `"price"`
`//div[contains(@class,'price')]`	`<div>` whose `class` contains `"price"`
`//span[text()='Total:']`	`<span>` whose text is exactly `"Total:"`
`/text()`	The text node of the selected element
`/@href`	The `href` attribute of the selected element
`/following-sibling::td[1]`	The first `<td>` sibling after the matched element
`/parent::div`	The parent `<div>` of the matched element
`[1]`, `[2]`	Position predicates (1-indexed)

Tips

contains() over exact match - contains(@class,'price') is more robust than @class='price' because the exact match breaks if the element has multiple classes.
/text() vs . - /text() selects only the direct text node; . selects all text including nested tags. Use /text() to avoid picking up text from child elements.
Test in the browser console - open DevTools, go to Console, and run $x('your expression here') to preview what XPath returns before saving the monitor.

How to Use XPath Extraction

What is it?

When to use it

How to configure it

Finding XPath expressions

Example 1 - Extract a price from an unlabelled span

Example 2 - Extract an attribute value (href)

Example 3 - Extract an aria-label for accessibility-driven data

Example 4 - Grab the Nth row of a table

XPath quick reference

Tips

More guides

Try Verid for free