Question

我想用 Python 抓取网页，但遇到了一些问题。这是我的代码：

from urllib import request
from bs4 import BeautifulSoup

pageURL="https://gamesnacks.com/embed/games/omnomrun"
rawPage=request.urlopen(pageURL)

soup=BeautifulSoup(rawPage, "html5lib")

content=soup.article

linksList=[]


for link in content.find_all('a'):
    url=link.get("href")
    img=link.get("src")
    text=link.span.text

linksList.append({"url":"url","img":"img","text":"text"})

try:
    url=link.get("href")
    img=link.get("src")
    text=link.span.text
    linksList.append({"url":"url","img":"img","text":"text"})
except AttributeError:
    pass

import json

with open("links.json","w",encoding="utf-8") as links_file:
    json.dump(linksList,links_file,ensure_ascii=False)

print("the work is done")

它给出了一个错误 for link in content.find_all('a'):

我已经尝试过一些在线帮助，但没有成功。

Answer 1

您将 content 定义为 soup.article，但 article 只是 None，因此您遇到此错误：

Traceback (most recent call last):
  File "main.py", line 14, in <module>
    for link in content.find_all('a'):
AttributeError: 'NoneType' object has no attribute 'find_all'

因为 None 本身不是 BeautifulSoup 对象，所以它不会有任何方法，例如 find_all()。

您需要找到一个更好的位置来检索 article 应该是什么。

尝试使用 soup.find_all("article")，然后遍历它。也许您的网站包含多个 article 标签，但是，通过访问该网站并检查其来源来判断，我在任何地方都看不到任何 <article> 标签，这就是没有 article 属性的原因如果它只是一次出现并且即使使用 find_all("article") 也很可能不会返回任何有用的东西。

使用 Python 抓取网络的问题

1 个答案: