Web Scraper-AttributeError:“ NoneType”对象没有属性“ text”

时间:2018-12-19 22:00:57

标签: python-3.x web-scraping beautifulsoup

嗨,我正在尝试使用beautifulsoup和requests为myanimelist的顶级漫画部分(https://myanimelist.net/topmanga.php)制作一个网络抓取工具。 我的问题是我收到此错误代码:

Traceback (most recent call last):
  File "C:/Kaan/Proje/Python/Programlar/Manga/manga2.py", line 119, in <module>
    mangainfo(new_url)
  File "C:/Kaan/Proje/Python/Programlar/Manga/manga2.py", line 29, in mangainfo
    names.append(name.text)
AttributeError: 'NoneType' object has no attribute 'text'

我了解此错误代码,但我不了解的是我是随机得到的。可以说我在20号漫画上遇到此错误。如果我再次启动该程序,则代码可以正常工作直到200号漫画,然后给出错误。在编写此代码时,我再次运行该程序,并在漫画140处出现错误。

我该如何解决这个问题?是因为我的编码而发生,还是因为网站而发生?

import requests
from bs4 import Tag, BeautifulSoup
time import time 

url = "https://myanimelist.net/topmanga.php"

def mangainfo(url):
    manga_id, scores, manga_genre, names, authors = list(), list(), list(), list(), list()
    r = requests.get(url)
    c = r.content
    soup2 = BeautifulSoup(c, "html.parser")
    # Manga Links
    manga_links = soup2.find_all("a", class_="hoverinfo_trigger fs14 fw-b")
    count = 0
    for link in manga_links:
        start = time()
        r = requests.get(link["href"])
        c = r.content
        soup = BeautifulSoup(c, "html.parser")

        # Names
        name = soup.find("h1", class_="h1")
        names.append(name.text)

        # Scores
        score = soup.find("div", class_="fl-l score")
        scores.append(str(score.text.strip()))

        # Manga ID
        for x in link["href"].split("/"):
            if x.isdigit():
                manga_id.append(int(x))
                break

        # Manga Genres
        genre = soup.find("span", text="Genres:")
        manga_genre.append([x.text for x in genre.next_siblings if isinstance(x, Tag)])

        # Authors
        author = soup.find("span", text="Authors:")
        authors.append([x.text for x in author.next_siblings if isinstance(x, Tag)])

        stop = time()
        count += 1
        print("{} - Time: {:.2f} Link: {} ==> OK".format(count, stop - start, link["href"]))

for i in range(0, 46651, 50):  # 46651 limit
    new_url = url + "?limit=" + str(i)
    print("Sayfa: {} ===>  {}".format(str(i), new_url))
    start_time = time()
    mangainfo(new_url)
    stop_time = time()
    print("MangaInfo Time: {:.2f}".format(stop_time - start_time))

0 个答案:

没有答案