我正在尝试从Wikipedia中获取一堆表格,这是我的代码
from urllib import urlopen
from bs4 import BeautifulSoup
import csv
url="https://en.wikipedia.org/wiki/List_of_colors:_A%E2%80%93F"
html=urlopen(url)
soup=BeautifulSoup(html,'html.parser')
table=soup.find('table',class_='wikitable sortable')
rows=table.findAll('tr')
csvFile=open("colors.csv",'w+')
writer=csv.writer(csvFile)
try:
for row in rows:
csvRow=[]
for cell in row.findAll(['td','th']):
csvRow.append(cell.get_text().decode("utf-8"))
try:
writer.writerow(csvRow)
except AttributeError:
print "--"
continue
except UnicodeEncodeError:
print "=="
finally:
csvFile.close()
我想写一个简单的代码,但是我遇到了很多错误,所以我添加了一些要修复的异常,但是我仍然只获得第一行,任何帮助都可以得到
答案 0 :(得分:1)
您要编码,而不是解码。
from urllib import urlopen
from bs4 import BeautifulSoup
import csv
url="https://en.wikipedia.org/wiki/List_of_colors:_A%E2%80%93F"
html=urlopen(url)
soup=BeautifulSoup(html,'html.parser')
table=soup.find('table',class_='wikitable sortable')
rows=table.findAll('tr')
csvFile=open("colors.csv",'w+')
writer=csv.writer(csvFile)
for row in rows:
csvRow=[]
for cell in row.findAll(['td','th']):
csvRow.append(cell.get_text().encode("utf-8"))
print(cell.get_text())
writer.writerow(csvRow)
csvFile.close()