您好我正在尝试从网页中删除数据
<div id="print">
.
.
<div id="item">
<div class="span3 col-3">
Processor: 6th Gen. Intel Core i5 6200U
<br>
Clock speed: 2.30-2.80GHz
<br>
</div>
</div>
<div id="item">
.
.
</div>
</div>
当我使用
时for res in response.css('div#print'):
text = res.css("div#item div.col-3::text").extract()
输出:
u'Processor:\xa07th Gen. Intel Core i5 7200U ', u'Clock speed:\xa02.50-3.10GHz '
我得到2个元素。如何使用<br>
标记获取整个文本。感谢。
答案 0 :(得分:0)
您应尝试从脚本中删除(new PermissionSet(PermissionState.Unrestricted)).Assert();
::text
输出:
import html2text # to convert HTML to text
for res in response.css('div#print'):
text = res.css("div#item div.col-3").extract() # removed ::text
print html2text.html2text(text[0])