抓取数据以获取URL列表

时间:2019-12-24 22:39:33

标签: python web-scraping

我有一个URL列表,我希望对该列表中的每个URL进行Web抓取。

def soup():
    for url in website_list:
        sauce = urllib.request.urlopen(url)
        print (url)
        for things in sauce:
            soup_maker = BeautifulSoup(things, 'html.parser')
            return soup_maker

尝试这样的事情。您可以协助下一步吗?

1 个答案:

答案 0 :(得分:0)

为您提供一个使用frame简化_scrapy框架进行下载的示例。您需要先安装简体中文_scrapy。 pip install简体_scrapy

from simplified_scrapy.spider import Spider, SimplifiedDoc
from simplified_scrapy.simplified_main import SimplifiedMain
class DemoSpider(Spider):
  name = 'demo-spider'
  start_urls = ['http://example.com'] # Replace with your website_list 

  def extract(self, url, html, models, modelNames):
    try:
      doc = SimplifiedDoc(html)
      print (doc.title)
    except Exception as e:
      print ('extract',e)

SimplifiedMain.startThread(DemoSpider())# start scrapping