← All guides
XPathextraction

How to Use XPath Extraction

Navigate the full document tree with XPath expressions. More powerful than CSS selectors for complex conditions, attribute access, and partial text matching.

Verid Guides·4 min read

What is it?

XPath (XML Path Language) is a query language for navigating HTML and XML documents. Unlike CSS selectors, XPath can:

  • Match elements by their text content (e.g., find the <td> that contains "Total Price")
  • Traverse upward in the DOM tree (find the parent of a matching child)
  • Match elements by partial attribute values using contains()
  • Access attribute values directly (e.g., href, data-id, aria-label)

If CSS selectors can't target the exact node you need, XPath almost certainly can.

When to use it

  • The element has no class or ID - you have to find it by its content or position
  • You need to extract an attribute value rather than text (e.g., href, src, data-price)
  • You need to find an element relative to another (e.g., the <td> next to the one that says "Price:")
  • The site's markup uses generic tags with no useful class names

How to configure it

Pick XPath as your extraction method, then map field names to XPath 1.0 expressions:

{
  "method": "xpath",
  "fields": {
    "price": "//span[contains(@class,'price')]/text()",
    "rating": "//div[@data-testid='rating']/@aria-label"
  }
}
  • /text() - selects the text node of the element
  • /@attribute - selects an attribute value
  • Results with multiple matches return the first match as a string

Finding XPath expressions

In Chrome DevTools: right-click the element → Inspect → right-click the node → Copy → Copy XPath.


Example 1 - Extract a price from an unlabelled span

Goal: Track a ticket price on a site that uses no meaningful class names.

Page HTML:

<div class="event-details">
  <span>Venue:</span><span>Madison Square Garden</span>
  <span>Date:</span><span>June 14, 2026</span>
  <span>Price:</span><span>$85.00</span>
</div>

There's no class on the price span, but it always follows the span with "Price:" as text.

Configuration:

{
  "method": "xpath",
  "fields": {
    "price": "//span[text()='Price:']/following-sibling::span[1]/text()",
    "venue": "//span[text()='Venue:']/following-sibling::span[1]/text()"
  }
}

Output:

{
  "price": "$85.00",
  "venue": "Madison Square Garden"
}

Example 2 - Extract an attribute value (href)

Goal: Track the download URL for the latest release on a changelog page.

Page HTML:

<section class="latest-release">
  <h2>v3.0.0</h2>
  <a class="download-btn" href="https://releases.example.com/v3.0.0/app.zip">
    Download
  </a>
</section>

Configuration:

{
  "method": "xpath",
  "fields": {
    "version": "//section[contains(@class,'latest-release')]/h2/text()",
    "download_url": "//section[contains(@class,'latest-release')]//a[@class='download-btn']/@href"
  }
}

Output:

{
  "version": "v3.0.0",
  "download_url": "https://releases.example.com/v3.0.0/app.zip"
}

Example 3 - Extract an aria-label for accessibility-driven data

Goal: Read a star rating from an element that stores it only in an aria-label.

Page HTML:

<div class="rating" data-testid="product-rating" aria-label="4.5 out of 5 stars">
  ★★★★½
</div>

Configuration:

{
  "method": "xpath",
  "fields": {
    "rating": "//div[@data-testid='product-rating']/@aria-label"
  }
}

Output:

{
  "rating": "4.5 out of 5 stars"
}

Example 4 - Grab the Nth row of a table

Goal: Track the price of the second-cheapest item in a pricing table.

Page HTML:

<table id="pricing-table">
  <tr><td>Basic</td><td>$9/mo</td></tr>
  <tr><td>Pro</td><td>$29/mo</td></tr>
  <tr><td>Enterprise</td><td>$99/mo</td></tr>
</table>

Configuration:

{
  "method": "xpath",
  "fields": {
    "basic_price": "//table[@id='pricing-table']/tr[1]/td[2]/text()",
    "pro_price": "//table[@id='pricing-table']/tr[2]/td[2]/text()"
  }
}

Output:

{
  "basic_price": "$9/mo",
  "pro_price": "$29/mo"
}

XPath quick reference

Expression What it does
//div Any <div> anywhere in the document
//div[@class='price'] <div> with class exactly equal to "price"
//div[contains(@class,'price')] <div> whose class contains "price"
//span[text()='Total:'] <span> whose text is exactly "Total:"
/text() The text node of the selected element
/@href The href attribute of the selected element
/following-sibling::td[1] The first <td> sibling after the matched element
/parent::div The parent <div> of the matched element
[1], [2] Position predicates (1-indexed)

Tips

  • contains() over exact match - contains(@class,'price') is more robust than @class='price' because the exact match breaks if the element has multiple classes.
  • /text() vs . - /text() selects only the direct text node; . selects all text including nested tags. Use /text() to avoid picking up text from child elements.
  • Test in the browser console - open DevTools, go to Console, and run $x('your expression here') to preview what XPath returns before saving the monitor.

Try Verid for free

5 monitors, no credit card required.

Get started free