使用scrapy从网站抓取文字时,将抓取所有内容但不抓取链接内容,如何解决问题

时间:2020-03-10 13:30:40

标签: python web-scraping scrapy

在抓取数据时,不抓取全部内容,在这里不抓取标签内容以及一次抓取标签href的方式。

HTML.code

<p class="gnt_ar_b_p">
24/7 Tempo has compiled a list of drugs in short supply from information provided by the 
   <p class="gnt_ar_b_p">
   However, drugs are frequently announced to be in short supply. In 
   fact, the FDA has a running list of drug shortages due to anything from increasing demand 
   to regulatory factors as well as supply disruptions.
   </p>
  <a href="https://www.accessdata.fda.gov/scripts/drugshortages/default.cfm" data-t- 
   l="|inline|intext|n/a" class="gnt_ar_b_a">
   Food and Drug Administration</a>.
</p>

外壳

response.css('p.gnt_ar_b_p').xpath("text()").extract()

输出

24/7 Tempo has compiled a list of drugs in short supply from information provided by the 
However, drugs are frequently announced to be in short supply. In 
fact, the FDA has a running list of drug shortages due to anything from increasing demand 
to regulatory factors as well as supply disruptions.

0 个答案:

没有答案