我试图读取一个html文件,并抓取相关信息。 如果我以文本格式打开它,此代码有效,但如果我以html格式打开它,我会收到下一条错误消息:
"File "C:\Python27\TEST5.py", line 29, in <module>
for record in tab6col.find_all('tr'):
AttributeError: 'NoneType' object has no attribute 'find_all'"
这两种方法有什么区别?当我尝试以html格式打开它时,为什么不起作用?
filename=r'output.csv'
resultcsv=open(filename,"wb")
output=csv.writer(resultcsv, delimiter=';',quotechar = '"', quoting=csv.QUOTE_NONNUMERIC, encoding='latin-1')
f = codecs.open('proba.html', 'r')
x = f.read()
soup = BeautifulSoup(x, 'lxml')
tab6col = soup.find('table', { "class" : "tab6col" })
datatable=[]
for record in tab6col.find_all('tr'):
temp_data = []
for data in record.find_all('td'):
temp_data.append(data.text.encode('latin-1'))
datatable.append(temp_data)
output.writerows(datatable)
resultcsv.close()
&#13;