Question

我正试图抓文：

符合Amazon Prime的免费送货服务

使用以下XPath在All Offers的this product页面上

：

.//*[@id='olpOfferList']/div/div/div[2]/div[1]/span[2]/i/span[contains(@class, '-')]
.//*[@id='olpOfferList']/div/div/div[2]/div[1]/span[2]/i/span

但是，虽然两个XPath在Firebug中匹配，但它们在Selenium中返回一个空字符串。

我大致使用以下代码来删除文本：

    try {
        String scrapedText = driver.findElement(By.xpath(XPath)).getText();

    } catch (Exception e) {

        e.printStackTrace();
    }

编辑：由于某种原因，stackoverflow上的链接未重定向到“所有优惠”页面（仅限主产品页面）。要在所有优惠页面上查看HTML，请附加以下内容：

/gp/offer-listing/0615797806/ref=olp_f_new?ie=UTF8&f_all=true&f_new=true   to   amazon.com

更新：以下是来自网页的HTML代码段。

<div class="a-fixed-left-flipped-grid a-spacing-mini">
<div class="a-fixed-left-grid-inner" style="padding-left:170px">
<div id="olpOfferListColumn" class="a-fixed-left-grid-col a-col-right" style="padding-left:0%;width:100%;float:right;">
<div id="olpOfferList" class="a-section olpOfferList">
<div class="a-section a-padding-small">
<div class="a-section a-spacing-double-large" role="grid" aria-readonly="true" aria-label="More buying choices">
<div class="a-row a-spacing-mini" role="row">
<hr class="a-spacing-mini a-divider-normal"/>
<div class="a-row a-spacing-mini olpOffer" role="row">
<div class="a-column a-span2 olpPriceColumn" role="gridcell">
<span class="a-size-large a-color-price olpOfferPrice a-text-bold">                $10.79                </span>
<span class="supersaver">
<i class="a-icon a-icon-prime" aria-label="Eligible for free shipping with Amazon Prime.">
<span class="a-icon-alt">Eligible for free shipping with Amazon Prime.</span> // I want to scrape this text
</i>
</span>
<p class="olpShippingInfo">
</div>
<div class="a-column a-span3 olpConditionColumn" role="gridcell">
<div class="a-column a-span3 olpDeliveryColumn" role="gridcell">
<div class="a-column a-span2 olpSellerColumn" role="gridcell">
<div class="a-column a-span2 olpBuyColumn a-span-last" role="gridcell">
</div>

Answer 1

我会尝试用“＃34;完成工作”来回答我自己的问题。但我仍在寻找更好的答案。

如果不是通过getText()进行抓取，而是通过getAttribute()使用属性"textContent"进行抓取，我可以成功抓取所述内容。

然而，虽然这个技术回答了问题（或解决了潜在的问题），但我仍在寻找一种使用getText()方法直接执行此操作的方法，所以我认为这只是一个部分答案。我也试图理解为什么我的原始代码不起作用。

Answer 2

根据您分享的HTML， class supersaver 将是唯一的。因此，您可以使用以下代码行：

String scrapedText = driver.findElement(By.xpath("//span[@class="supersaver"]/i/span")).getText();

Answer 3

在你的html中有一个属性Activity，你也可以通过它找到，只使用一两个词来匹配，比如aria-label或Eligible来识别它。

free shipping

无法通过XPath＆amp; Scrap刮取跨文本的webdriver

3 个答案: