Question

我使用以下代码（取自retrieve links from web page using python and BeautifulSoup）：

import httplib2
from BeautifulSoup import BeautifulSoup, SoupStrainer

http = httplib2.Http()
status, response = http.request('http://www.nytimes.com')

for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a')):
    if link.has_attr('href'):
        print link['href']

但是，我不明白为什么会收到以下错误消息：

Traceback (most recent call last):
  File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 13, in <module>
    if link.has_attr('href'):
TypeError: 'NoneType' object is not callable

BeautifulSoup 3.2.0 Python 2.7

修改

我尝试了针对类似问题的解决方案（Type error if link.has_attr('href'): TypeError: 'NoneType' object is not callable），但它给了我以下错误：

Traceback (most recent call last):
  File "C:\Users\EANUAMA\workspace\PatternExtractor\src\SourceCodeExtractor.py", line 12, in <module>
    for link in BeautifulSoup(response).find_all('a', href=True):
TypeError: 'NoneType' object is not callable

Answer 1

首先：

from BeautifulSoup import BeautifulSoup, SoupStrainer

您正在使用不再维护的BeautifulSoup version 3 。切换到BeautifulSoup version 4。通过以下方式安装：

pip install beautifulsoup4

并将导入更改为：

from bs4 import BeautifulSoup

此外：

追踪（最近一次通话）：文件“C：\ Users \ EANUAMA \ workspace \ PatternExtractor \ src \ SourceCodeExtractor.py”，第13行，in 如果link.has_attr（'href'）： TypeError：'NoneType'对象不可调用

此处link是Tag实例，没有has_attr方法。这意味着，记住dot notation means in BeautifulSoup的内容，它会尝试在has_attr元素中搜索元素link，这会导致无法找到。换句话说，link.has_attr为None，显然None('href')会导致错误。

相反，请执行：

soup = BeautifulSoup(response, parse_only=SoupStrainer('a', href=True))
for link in soup.find_all("a", href=True):
    print(link['href'])

仅供参考，这是我用来调试问题的完整工作代码（使用requests）：

import requests
from bs4 import BeautifulSoup, SoupStrainer


response = requests.get('http://www.nytimes.com').content
for link in BeautifulSoup(response, parseOnlyThese=SoupStrainer('a', href=True)).find_all("a", href=True):
    print(link['href'])

BeautifulSoup无法正常工作，收到NoneType错误

1 个答案: