使用python无法正确检索csv文件中的信息

时间:2016-03-03 05:52:52

标签: python web-scraping beautifulsoup

我正在抓取此site中的一些内容。在从csv文件中的网站中提取后,像会议头一样写入时,第一个名称不正确,例如,如果该单词为microsoft,则会显示为osoft,但所有单词都会正常显示

这是我的代码:

import csv
import requests
from bs4 import BeautifulSoup

with open('random.csv', 'w') as csvfile:
    a = csv.writer(csvfile)
    a.writerow(["conferenceHead"])

    url = given above      
    r = requests.get(url)
    soup = BeautifulSoup(r.content)
    links = soup.find_all("div")

    r_data = soup.find_all("div",{"class":"conferenceHead"})
    for item in r_data:
        conferenceHead = item.contents[1].text


        with open('random.csv','a') as csvfile:
            a = csv.writer(csvfile)
            data = [conferenceHead]
        a.writerow(data)

1 个答案:

答案 0 :(得分:1)

嗯,您的代码中有三个问题。

  • with open()语句(在同一文件中)
  • 和第二个打开 - 追加模式,处于循环中,这使情况更糟
  • 上一个writerow超出范围,csvfile已经关闭

这可能导致缓冲区无法写入文件,截断字符串您正在保存。

修复此错误(删除with open('random.csv','a') as csvfile并修复缩进)后,代码运行并且不会修剪输出。

import csv
import requests
from bs4 import BeautifulSoup
with open('random.csv', 'w') as csvfile:
    a = csv.writer(csvfile)
    a.writerow(["conferenceHead"])

    url = "http://www.allconferences.com/search/index"\
          "/Category__parent_id:1/Venue__country:United%20States"\
          "/Conference__start_date__from:01-01-2010/sort:start_date"\
          "/direction:asc/showLastConference:1/page:7/"
    r = requests.get(url)
    soup = BeautifulSoup(r.content)
    links = soup.find_all("div")

    r_data = soup.find_all("div",{"class":"conferenceHead"})

    for item in r_data:
        conferenceHead = item.contents[1].text
        data = [conferenceHead]
        a.writerow(data)