我正在尝试从亚马逊页面获取“运费”值(即4.65磅),这是其中涉及的部分(来自https://www.amazon.com/dp/B0018RSEMU):
<div class="a-row a-spacing-top-base">
<div class="a-column a-span6">
<div class="a-row a-spacing-base">
<div class="a-section table-padding">
<table id="productDetails_detailBullets_sections1" class="a-keyvalue prodDetTable" role="presentation">
<tr>
<th class="a-color-secondary a-size-base prodDetSectionEntry">
Shipping Weight
</th>
<td class="a-size-base">
4.65 pounds (<a href="https://www.amazon.com/gp/help/seller/shipping.html/ref=dp_pd_shipping?_encoding=UTF8&seller=ATVPDKIKX0DER&asin=B0018RSEMU">View shipping rates and policies</a>)
</td>
</tr>
......
我这样编码:
from lxml import html
import requests
headers = {'User-Agent': '...'}
page = requests.get(url, headers = headers)
doc = html.fromstring(page.content)
XPATH_WEIGHT = '//th[contains(text(),"Shipping Weight")]/following-sibling::td/text()'
RAW_WEIGHT = doc.xpath(XPATH_WEIGHT)
运行后,它什么也不返回。有什么问题?使用相同的语法,我可以正确获取其他标签的文本。这里很困惑。