使用xpath获取属性

时间:2016-08-03 14:12:58

标签: python xpath scrapy

给出像这样的HTML结构:

<dd itemprop="actors">
    <span itemscope="" itemtype="http://schema.org/Person">
        <a itemprop="name">Yumi Kazama</a>,                 </span>

<span itemscope="" itemtype="http://schema.org/Person">
    <a itemprop="name">Yuna Mizumoto</a>,               </span>

<span itemscope="" itemtype="http://schema.org/Person">
    <a itemprop="name">Rei Aoki</a>,                        </span>
</dd>

如何为所有a/text()元素获取itemprop="name"的所有值?

URL:

//*[@itemprop='actors']//*[@itemprop='name']/text()

只获得第一个a/text

1 个答案:

答案 0 :(得分:1)

假设您的html文件是 test.html ,则以下内容应该有效:

from lxml import html

with open(r'E:/backup/GoogleDrive/py/scrapy/test.html', "r") as f:
    page = f.read()
tree = html.fromstring(page)
names = tree.xpath("//a[@itemprop='name']//text()")
print names