我编写了这段代码,用于从网页中提取所有文本:
from BeautifulSoup import BeautifulSoup
import urllib2
soup = BeautifulSoup(urllib2.urlopen('http://www.pythonforbeginners.com').read())
print(soup.get_text())
问题是我收到此错误:
print(soup.get_text())
TypeError: 'NoneType' object is not callable
有关如何解决此问题的任何想法?
答案 0 :(得分:6)
该方法称为soup.getText()
,即camelCased。
为什么你得到TypeError
而不是AttributeError
这对我来说是一个谜!
答案 1 :(得分:0)
正如Markku在评论中建议的那样,我建议你破解你的代码。
from BeautifulSoup import BeautifulSoup
import urllib2
URL = "http://www.pythonforbeginners.com"
page = urllib2.urlopen('http://www.pythonforbeginners.com')
html = page.read()
soup = BeautifulSoup(html)
print(soup.get_text())
如果它仍然不起作用,请输入一些打印语句以查看正在发生的事情。
from BeautifulSoup import BeautifulSoup
import urllib2
URL = "http://www.pythonforbeginners.com"
print("URL is {} and its type is {}".format(URL,type(URL)))
page = urllib2.urlopen('http://www.pythonforbeginners.com')
print("Page is {} and its type is {}".format(page,type(page))
html = page.read()
print("html is {} and its type is {}".format(html,type(html))
soup = BeautifulSoup(html)
print("soup is {} and its type is {}".format(soup,type(soup))
print(soup.get_text())