xPath:如何从表中获取“标题”文本?

时间:2015-03-14 18:49:37

标签: html xpath

我正在使用xPath尝试从表的以下部分获取title文本:

    <td class="title" title="if you were in a job and then one day, the work..." data-id="3198695">
        <span id="thread_3198695" class="titleline threadbit">

            <span class="prefix">



            </span>
            <a id="thread_title_3198695" href="showthread.php?t=3198695">would this creep you out?</a>

            <span class="thread-pagenav">(Pgs:
                 <span><a href="showthread.php?t=3198695">1</a></span> <span><a href="showthread.php?t=3198695&amp;page=2">2</a></span> <span><a href="showthread.php?t=3198695&amp;page=3">3</a></span> <span><a href="showthread.php?t=3198695&amp;page=4">4</a></span>)</span>

        </span>
        <span class="byline">


                by
                <a href="member.php?u=1687137" data-id="3198695" class="username">
                    damoni
                </a>

        </span>

</td>

我想要的输出是"if you were in a job and then one day, the work..."

我一直在尝试Scrapy(python)中的各种表达式来尝试获取title。它输出一个奇怪的文本,例如:'\n\n \r \r \n \n\n\r'

 response.xpath("//tr[3]/td[@class='title']/text()")

我知道以下部分是正确的,至少(我验证它使用Chrome的开发者工具找到了正确的表格元素:

//tr[3]/td
# (This is the above snippet)

有关如何提取title

的任何想法

1 个答案:

答案 0 :(得分:2)

你想:

response.xpath("//tr[3]/td[@class='title']/@title")

请注意,text()选择节点的文本内容,但@attribute选择属性的值。由于所需文本存储在title属性中,因此您需要使用@title