我正在尝试抓取一个网址,该网址的URL后面有一个数字,范围为1。因此,我尝试使用range选项来迭代该URL,但是某些操作无效,我不确定问题是...
老实说,我已经修改了一个代码,但我勉强设法使其读取具有URL列表的csv文件。
网站像这样
https://www.mobygames.com/developer/sheet/view/developerId,1
https://www.mobygames.com/developer/sheet/view/developerId,2
https://www.mobygames.com/developer/sheet/view/developerId,3
https://www.mobygames.com/developer/sheet/view/developerId,4
Jupiter笔记本没有显示任何比错误消息更令人关注的结果.....
````
````
import bs4 as bs
import urllib.request
import csv
import numpy as np
base_url = "https://www.mobygames.com/developer/sheet/view/developerId,"
url_list =[]
def extract(gameurl):
req = urllib.request.Request(gameurl,headers={'User-Agent': 'Mozilla/5.0'})
sauce = urllib.request.urlopen(req).read()
soup = bs.BeautifulSoup(sauce,'lxml')
infopage = soup.find_all("div", {"class":"col-md-8 col-lg-8"})
core_list =[]
for credits in infopage:
niceHeaderTitle = credits.find_all("h1", {"class":"niceHeaderTitle"})
name = niceHeaderTitle[0].text
Titles = credits.find_all("h3", {"class":"clean"})
Titles = [title.get_text() for title in Titles]
tr = credits.find_all("tr")
for i in range(len(tr)):
row = tr[i].get_text(strip=True)
if row in Titles:
title = row
elif len(row) > 1:
games=[name,title,row]
core_list.append(games)
core_list = np.matrix(core_list)
return core_list
def csv_write(url_data):
with open ('human_resource.csv','a',encoding='utf-8') as file:
writer=csv.writer(file)
for row in url_data:
writer.writerow(row)
for url in url_list:
url = range(1,100)
link = base_url + url
url_data = extract(link)
csv_write(url_data)
````