我正在抓取此site中的一些内容。在从csv
文件中的网站中提取后,像会议头一样写入时,第一个名称不正确,例如,如果该单词为microsoft
,则会显示为osoft
,但所有单词都会正常显示
这是我的代码:
import csv
import requests
from bs4 import BeautifulSoup
with open('random.csv', 'w') as csvfile:
a = csv.writer(csvfile)
a.writerow(["conferenceHead"])
url = given above
r = requests.get(url)
soup = BeautifulSoup(r.content)
links = soup.find_all("div")
r_data = soup.find_all("div",{"class":"conferenceHead"})
for item in r_data:
conferenceHead = item.contents[1].text
with open('random.csv','a') as csvfile:
a = csv.writer(csvfile)
data = [conferenceHead]
a.writerow(data)
答案 0 :(得分:1)
嗯,您的代码中有三个问题。
with open()
语句(在同一文件中)这可能导致缓冲区无法写入文件,截断字符串您正在保存。
修复此错误(删除with open('random.csv','a') as csvfile
并修复缩进)后,代码运行并且不会修剪输出。
import csv
import requests
from bs4 import BeautifulSoup
with open('random.csv', 'w') as csvfile:
a = csv.writer(csvfile)
a.writerow(["conferenceHead"])
url = "http://www.allconferences.com/search/index"\
"/Category__parent_id:1/Venue__country:United%20States"\
"/Conference__start_date__from:01-01-2010/sort:start_date"\
"/direction:asc/showLastConference:1/page:7/"
r = requests.get(url)
soup = BeautifulSoup(r.content)
links = soup.find_all("div")
r_data = soup.find_all("div",{"class":"conferenceHead"})
for item in r_data:
conferenceHead = item.contents[1].text
data = [conferenceHead]
a.writerow(data)