Question

我正在尝试抓取一个网址，该网址的URL后面有一个数字，范围为1。因此，我尝试使用range选项来迭代该URL，但是某些操作无效，我不确定问题是...

老实说，我已经修改了一个代码，但我勉强设法使其读取具有URL列表的csv文件。

网站像这样

https://www.mobygames.com/developer/sheet/view/developerId,1
https://www.mobygames.com/developer/sheet/view/developerId,2
https://www.mobygames.com/developer/sheet/view/developerId,3
https://www.mobygames.com/developer/sheet/view/developerId,4

Jupiter笔记本没有显示任何比错误消息更令人关注的结果.....

````
    ````

    import bs4 as bs
    import urllib.request
    import csv
    import numpy as np

    base_url = "https://www.mobygames.com/developer/sheet/view/developerId,"
    url_list =[]


    def extract(gameurl):
        req = urllib.request.Request(gameurl,headers={'User-Agent': 'Mozilla/5.0'})
        sauce = urllib.request.urlopen(req).read()
        soup = bs.BeautifulSoup(sauce,'lxml')
        infopage = soup.find_all("div", {"class":"col-md-8 col-lg-8"})
        core_list =[]

        for credits in infopage:
            niceHeaderTitle = credits.find_all("h1", {"class":"niceHeaderTitle"})
            name = niceHeaderTitle[0].text

            Titles = credits.find_all("h3", {"class":"clean"})

            Titles = [title.get_text() for title in Titles]

            tr = credits.find_all("tr")

            for i in range(len(tr)):
                row = tr[i].get_text(strip=True)
                if row in Titles:
                    title = row
                elif len(row) > 1:
                    games=[name,title,row]
                    core_list.append(games)

            core_list = np.matrix(core_list)

            return core_list


    def csv_write(url_data):
        with open ('human_resource.csv','a',encoding='utf-8') as file:
            writer=csv.writer(file)
            for row in url_data:
                writer.writerow(row)

    for url in url_list:
        url = range(1,100)
        link = base_url + url            
        url_data = extract(link)
        csv_write(url_data)

    ````

使用范围命令来迭代网址以进行网络抓取

0 个答案: