我有一个URL列表,我希望对该列表中的每个URL进行Web抓取。
def soup():
for url in website_list:
sauce = urllib.request.urlopen(url)
print (url)
for things in sauce:
soup_maker = BeautifulSoup(things, 'html.parser')
return soup_maker
尝试这样的事情。您可以协助下一步吗?
答案 0 :(得分:0)
为您提供一个使用frame简化_scrapy框架进行下载的示例。您需要先安装简体中文_scrapy。 pip install简体_scrapy
from simplified_scrapy.spider import Spider, SimplifiedDoc
from simplified_scrapy.simplified_main import SimplifiedMain
class DemoSpider(Spider):
name = 'demo-spider'
start_urls = ['http://example.com'] # Replace with your website_list
def extract(self, url, html, models, modelNames):
try:
doc = SimplifiedDoc(html)
print (doc.title)
except Exception as e:
print ('extract',e)
SimplifiedMain.startThread(DemoSpider())# start scrapping