Question

问题1

这是HTML代码。

<div class="preferredContact paragraph">ph:<span preferredcontact="40">(02) 9540 9959</span></div>

我正在尝试使用xpath提取该电话号码。

我试过了

data['phone'] = c.xpath('.//span[@preferredContact="40"]/text()')

和

data['phone'] = c.xpath('.//span[contains(@preferredContact,"40")]/text()')

它们都只返回null。有人可以告诉我提取该电话号码的代码吗？

问题2

HTML代码

<a rel="nofollow" title="View website for Ruth Newman Architect (in new window)" target="_blank" name="listing_website" id="websiteLink40" alreadysentorpevent="false" class="links ext-no-tooltip orpDuplicateEvent" href="/app/redirect?headingCode=27898&amp;productId=473639214&amp;productVersion=1&amp;listingUrl=%2Fnsw%2Fgymea-bay%2Fruth-newman-architect-12781682-listing.html&amp;webSite=http%3A%2F%2Fwww.ruthnewman.com.au&amp;pt=w&amp;context=businessTypeSearch&amp;referredBy=YOL&amp;eventType=websiteReferral">www.ruthnewman.com.au
</a>

我想获得位于字符串 webSite = http％3A％2F％2F 旁边的链接。此字符串位于href属性的值中。所以，在上面的例子中，我想要 www.ruthnewman.com.au 。我不知道如何使用Xpath。

有人可以帮忙吗？

Answer 1

属性区分大小写。对于第一个问题使用（无上限）：

.//span[@preferredcontact='40']/text()

对于第二个问题：

substring-before(substring-after(
    .//a[contains(@href, 'webSite=')]/@href, 'webSite=http%3A%2F%2F'), '&')

首先选择属性中 'webSite=http%3A%2F%2F'之后的所有，然后使用它作为substring-before的输入，在之前提取之前的所有内容&，应包含目标字符串。

请注意，在您给出的示例中，并不真正需要descendant-or-self（//）轴。尽可能避免使用它。获得的灵活性是以精确度和效率为代价的。

需要帮助在我的Python代码中使用Xpath提取数据

1 个答案: