我正在使用xPath尝试从表的以下部分获取title
文本:
<td class="title" title="if you were in a job and then one day, the work..." data-id="3198695">
<span id="thread_3198695" class="titleline threadbit">
<span class="prefix">
</span>
<a id="thread_title_3198695" href="showthread.php?t=3198695">would this creep you out?</a>
<span class="thread-pagenav">(Pgs:
<span><a href="showthread.php?t=3198695">1</a></span> <span><a href="showthread.php?t=3198695&page=2">2</a></span> <span><a href="showthread.php?t=3198695&page=3">3</a></span> <span><a href="showthread.php?t=3198695&page=4">4</a></span>)</span>
</span>
<span class="byline">
by
<a href="member.php?u=1687137" data-id="3198695" class="username">
damoni
</a>
</span>
</td>
我想要的输出是:"if you were in a job and then one day, the work..."
我一直在尝试Scrapy
(python)中的各种表达式来尝试获取title
。它输出一个奇怪的文本,例如:'\n\n \r \r \n \n\n\r'
response.xpath("//tr[3]/td[@class='title']/text()")
我知道以下部分是正确的,至少(我验证它使用Chrome的开发者工具找到了正确的表格元素:
//tr[3]/td
# (This is the above snippet)
有关如何提取title
答案 0 :(得分:2)
你想:
response.xpath("//tr[3]/td[@class='title']/@title")
请注意,text()
选择节点的文本内容,但@attribute
选择属性的值。由于所需文本存储在title属性中,因此您需要使用@title
。