给出像这样的HTML结构:
<dd itemprop="actors">
<span itemscope="" itemtype="http://schema.org/Person">
<a itemprop="name">Yumi Kazama</a>, </span>
<span itemscope="" itemtype="http://schema.org/Person">
<a itemprop="name">Yuna Mizumoto</a>, </span>
<span itemscope="" itemtype="http://schema.org/Person">
<a itemprop="name">Rei Aoki</a>, </span>
</dd>
如何为所有a/text()
元素获取itemprop="name"
的所有值?
URL:
//*[@itemprop='actors']//*[@itemprop='name']/text()
只获得第一个a/text
。
答案 0 :(得分:1)
假设您的html文件是 test.html ,则以下内容应该有效:
from lxml import html
with open(r'E:/backup/GoogleDrive/py/scrapy/test.html', "r") as f:
page = f.read()
tree = html.fromstring(page)
names = tree.xpath("//a[@itemprop='name']//text()")
print names