我正在尝试从该div抓取文本
<div class="col-lg-6">
<h3 class="c-panel__section-heading">Reply</h3>
<div class="textAreaContainer closed">
<div contenteditable="true" class="customTextArea" id="Message" name="Message">
<p>Dear Customer, </p>
<p>the <span style="background-color: rgb(238, 238, 238);">MFDeviceMT.dll is a Matrox driver related dll, if you're not using a MATROX card on the server where you encountered the issue you can temporarily ignore it.</span></p>
<p><span style="background-color: rgb(238, 238, 238);">We have however forwarder the problem to our developing team, thank you for the feedback.</span></p>
<p><span style="background-color: rgb(238, 238, 238);">Best Regards.</span></p>
-------------- -------------- ----------- Email send to: martin.bonato@brasvideo.com;b2w.shoptime@brasvideo.com Email send cc: supporto@etere.com
</div>
</div>
</div>
但是现在该网站实施了标记
,我无法抓取所有文本 我正在使用此命令
sel.xpath('//*[@id="Message"]/text()').extract()[-1]
它返回
最后一个p标签中的所有文本
那我怎么用p标签刮掉div中的所有文本
答案 0 :(得分:0)
您要分别抓取p
的所有文本吗?遍历他们
for p in sel.css('#Message p'):
all_text = "".join(p.css("*::text").extract())
答案 1 :(得分:0)
我以这种方式做
sel.xpath('//*[@id="solutionsContainer"]/div[last()]/div[last()]/div//text()').extract()
for i_msg in ultima_msg:
limpa_msg = limpa_msg + i_msg.strip()
我认为您的方法更简单
但是谢谢大家