我一直试图将我抓取的数据传输到csv文件。这是我的代码:
import requests, bs4, csv, sys
reload(sys)
sys.setdefaultencoding('utf-8')
url = 'http://www.constructeursdefrance.com/resultat/?dpt=01'
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text,'html.parser')
links = []
for div in soup.select('.link'):
link = div.a.get('href')
links.append(link)
for i in links:
url2 = i
res2 = requests.get(url2)
soup2 = bs4.BeautifulSoup(res2.text, 'html.parser')
for each in soup2.select('li > strong'):
data = each.text, each.next_sibling
with open('french.csv', 'wb') as file:
writer = csv.writer(file)
writer.writerows(data)
输出说:
Traceback (most recent call last):
File "test_new_project.py", line 23, in <module>
writer.writerows(data)
csv.Error: sequence expected
但是我试图将元组放入csv文件中,只要我知道csv接受元组和列表。我该如何解决这个问题?
答案 0 :(得分:0)
更改此
for each in soup2.select('li > strong'):
data = each.text, each.next_sibling
到这个
data=[]
for each in soup2.select('li > strong'):
data.append((each.text, each.next_sibling))
您的数据变量是一个元组而不是元组列表。上面的代码创建了一个元组列表。
其他解决方案是这个(记住缩进)
data = []
for i in links:
url2 = i
res2 = requests.get(url2)
soup2 = bs4.BeautifulSoup(res2.text, 'html.parser')
for each in soup2.select('li > strong'):
data.append((each.text, each.next_sibling))
with open('french.csv', 'wb') as file:
writer = csv.writer(file)
writer.writerows(data)
答案 1 :(得分:0)
Atirag是正确的,但是你有另一个问题,即打开输出文件的调用嵌套在for循环中。因此,如果有多个链接,则每次都会覆盖该文件,并且输出将不是您所期望的。我认为这应该产生你想要的输出:
for div in soup.select('.link'):
link = div.a.get('href')
links.append(link)
with open("french.csv", "w") as file:
writer = csv.writer(file)
for i in links:
res2 = requests.get(i)
soup2 = bs4.BeautifulSoup(res2.text, 'html.parser')
for each in soup2.select('li > strong'):
writer.writerow([each.text, each.next_sibling])