Question

给出像这样的HTML结构：

<dd itemprop="actors">
    <span itemscope="" itemtype="http://schema.org/Person">
        <a itemprop="name">Yumi Kazama</a>,                 </span>

<span itemscope="" itemtype="http://schema.org/Person">
    <a itemprop="name">Yuna Mizumoto</a>,               </span>

<span itemscope="" itemtype="http://schema.org/Person">
    <a itemprop="name">Rei Aoki</a>,                        </span>
</dd>

如何为所有a/text()元素获取itemprop="name"的所有值？

URL：

//*[@itemprop='actors']//*[@itemprop='name']/text()

只获得第一个a/text。

Answer 1

假设您的html文件是 test.html ，则以下内容应该有效：

from lxml import html

with open(r'E:/backup/GoogleDrive/py/scrapy/test.html', "r") as f:
    page = f.read()
tree = html.fromstring(page)
names = tree.xpath("//a[@itemprop='name']//text()")
print names

使用xpath获取属性

1 个答案: