Question

我想在美丽的汤，find_all元素中做同样的事情并迭代它们以在每个迭代元素中找到一些other_elements。即：

soup = bs4.BeautifulSoup(source)
articles = soup.find_all('div', class='v-card')
for article in articles:
    name = article.find('span', itemprop='name').text
    address = article.find('p', itemprop='address').text

现在我尝试在lxml中执行相同的操作：

tree = html.fromstring(source)
items = tree.xpath('//div[@class="v-card"]')
for item in items:
    name = item.xpath('//span[@itemprop="name"]/text()')
    address = item.xpath('//p[@itemprop="address"]/text()')

...但是这会查找树中的所有匹配项，无论它们是否在当前item之下。我该如何处理？

Answer 1

不要在后续查询中使用//作为前缀，后者明确要求查询从根目录而不是当前元素开始。相反，请使用.//进行相关查询：

for item in tree.xpath('//div[@class="v-card"]'):
    name = item.xpath('.//span[@itemprop="name"]/text()'
    address = item.xpath('.//p[@itemprop="address"]/text()')

通过lxml从root而不是element开始的xpath查找

1 个答案: