Scrapy-Python-在div中用p刮掉所有文本

时间:2018-10-17 13:38:30

标签: python html scrapy

我正在尝试从该div抓取文本

<div class="col-lg-6">
    <h3 class="c-panel__section-heading">Reply</h3>
    <div class="textAreaContainer closed">

        <div contenteditable="true" class="customTextArea" id="Message" name="Message">
            <p>Dear Customer,&nbsp;</p>
            <p>the&nbsp;<span style="background-color: rgb(238, 238, 238);">MFDeviceMT.dll is a Matrox driver related dll, if you're not using a MATROX card on the server where you encountered the issue you can temporarily ignore it.</span></p>
            <p><span style="background-color: rgb(238, 238, 238);">We have however forwarder the problem to our developing team, thank you for the feedback.</span></p>
            <p><span style="background-color: rgb(238, 238, 238);">Best Regards.</span></p>
            -------------- -------------- ----------- Email send to: martin.bonato@brasvideo.com;b2w.shoptime@brasvideo.com Email send cc: supporto@etere.com
        </div>
    </div>
</div>

但是现在该网站实施了标记

,我无法抓取所有文本 我正在使用此命令

sel.xpath('//*[@id="Message"]/text()').extract()[-1]

它返回

最后一个p标签中的所有文本

那我怎么用p标签刮掉div中的所有文本

2 个答案:

答案 0 :(得分:0)

您要分别抓取p的所有文本吗?遍历他们

for p in sel.css('#Message p'):
   all_text = "".join(p.css("*::text").extract())

答案 1 :(得分:0)

我以这种方式做

sel.xpath('//*[@id="solutionsContainer"]/div[last()]/div[last()]/div//text()').extract()
for i_msg in ultima_msg:
            limpa_msg = limpa_msg + i_msg.strip()

我认为您的方法更简单

但是谢谢大家