Web抓取到CSV文件只得到第一行

时间:2018-12-21 15:30:28

标签: python python-2.7 csv

我正在尝试从Wikipedia中获取一堆表格,这是我的代码

from urllib import urlopen
from bs4 import BeautifulSoup
import csv
url="https://en.wikipedia.org/wiki/List_of_colors:_A%E2%80%93F"
html=urlopen(url)
soup=BeautifulSoup(html,'html.parser')
table=soup.find('table',class_='wikitable sortable')
rows=table.findAll('tr')
csvFile=open("colors.csv",'w+')
writer=csv.writer(csvFile)
try:
    for row in rows:
        csvRow=[]
        for cell in row.findAll(['td','th']):
            csvRow.append(cell.get_text().decode("utf-8"))
        try:
            writer.writerow(csvRow)
        except AttributeError: 
            print "--"
            continue
except UnicodeEncodeError:
    print "=="
finally:
    csvFile.close()

我想写一个简单的代码,但是我遇到了很多错误,所以我添加了一些要修复的异常,但是我仍然只获得第一行,任何帮助都可以得到

1 个答案:

答案 0 :(得分:1)

您要编码,而不是解码。

from urllib import urlopen
from bs4 import BeautifulSoup
import csv
url="https://en.wikipedia.org/wiki/List_of_colors:_A%E2%80%93F"
html=urlopen(url)
soup=BeautifulSoup(html,'html.parser')
table=soup.find('table',class_='wikitable sortable')
rows=table.findAll('tr')
csvFile=open("colors.csv",'w+')
writer=csv.writer(csvFile)
for row in rows:
    csvRow=[]
    for cell in row.findAll(['td','th']):
        csvRow.append(cell.get_text().encode("utf-8"))
        print(cell.get_text())
    writer.writerow(csvRow)

csvFile.close()