我正在使用python 2.7,而且我试图通过网络抓取包含表格的网站。我一直收到此错误消息: AttributeError:addinfourl实例没有属性' findAll'
我使用" findAll"不正确?谢谢!
wind = urllib2.urlopen('http://w1.weather.gov/data/obhistory/KCQX.html')
# print(third_page)
tables = wind.findAll('table')
data_table = tables[3]
rows = data_table.findAll('tr')
output_matrix = []
for row in rows:
subrow = row.findAll('td')
new_row = []
if(len(subrow)>0):
temp_row = []
for subsubrow in subrow:
temp_row.append(subsubrow.get_text().strip())
output_matrix.append(temp_row)
答案 0 :(得分:0)
wind
变量是一个类文件对象,它不包含findAll
个方法。如果你想要BeautifulSoup,你需要从页面内容创建一个新的“汤”:
from bs4 import BeautifulSoup
import urllib2
html = urllib2.urlopen('http://w1.weather.gov/data/obhistory/KCQX.html').read()
wind = BeautifulSoup(html)
BeautifulSoup
的构造函数也可以采用类似文件的对象,因此您可以删除最后一个.read()
:
html = urllib2.urlopen('http://w1.weather.gov/data/obhistory/KCQX.html')
wind = BeautifulSoup(html)