Python Scrapy让所有人都像孩子一样但却无视

时间:2016-11-26 00:59:04

标签: python scrapy

您好我正在尝试从网页中删除数据

<div id="print">
  .
  .
  <div id="item">
    <div class="span3 col-3">
       Processor: 6th Gen. Intel Core i5 6200U
       <br>
       Clock speed: 2.30-2.80GHz
       <br>
    </div>
  </div>
  <div id="item">
  .
  .
  </div>
</div>

当我使用

for res in response.css('div#print'):
    text = res.css("div#item div.col-3::text").extract()

输出:

u'Processor:\xa07th Gen. Intel Core i5 7200U ', u'Clock speed:\xa02.50-3.10GHz '

我得到2个元素。如何使用<br>标记获取整个文本。感谢。

1 个答案:

答案 0 :(得分:0)

您应尝试从脚本中删除(new PermissionSet(PermissionState.Unrestricted)).Assert();

::text

输出:

import html2text  # to convert HTML to text
for res in response.css('div#print'):
        text = res.css("div#item div.col-3").extract()  # removed ::text
print html2text.html2text(text[0])