我正在尝试提取数据。 这是html的具体部分 -
<div class="readable">
<span id="freeTextContainer2123443890291117716">I write because I need to. <br>I review because I want to.
<br>I pay taxes because I have to.
<br><br>If you want to follow me, my username is @colleenhoover pretty much everywhere except my email, which is colleenhooverbooks@gmail.com
<br><br>Founder of
<a target="_blank" href="http://www.thebookwormbox.com" rel="nofollow">www.thebookwormbox.com</a>
<br><br></span>
</div>
我想要这样的输出 -
I write because I need to.
I review because I want to.
I pay taxes because I have to.
If you want to follow me, my username is @colleenhoover pretty much everywhere except my email, which is colleenhooverbooks@gmail.com
Founder of www.thebookwormbox.com
我正在尝试这个 -
aboutauthor=response.xpath('//div[@id="aboutAuthor"]/div[@class="bigBoxBody"]/div[@class="bigBoxContent containerWithHeaderContent"]/div[@class="readable"]/span[1]/text()').extract() if len(response.xpath('//div[@id="aboutAuthor"]/div[@class="bigBoxBody"]/div[@class="bigBoxContent containerWithHeaderContent"]/div[@class="readable"]/span')) == 1 else response.xpath('//div[@id="aboutAuthor"]/div[@class="bigBoxBody"]/div[@class="bigBoxContent containerWithHeaderContent"]/div[@class="readable"]/span[2]/text()').extract()
print aboutauthor
获得输出 -
[u'I write because I need to. ', u'I review because I want to. ', u'I pay taxes
because I have to. ', u'If you want to follow me, my username is @colleenhoover
pretty much everywhere except my email, which is colleenhooverbooks@gmail.com',
u'Founder of ', u' ']
我这样做,我得到www.thebookwormbox.com
输出?
答案 0 :(得分:2)
根据我的评论,您可以使用带有//text()
的xpath来获取所有孩子的文字内容。