Question

将您使用BeautifulSoup抓取的数据移动到CSV文件中似乎至关重要。我接近成功，但不知何故，CSV文件中的每一列都是来自刮取信息的一个字母，并且它只移动了最后一项刮擦。

这是我的代码：

import urllib2
import csv
from bs4 import BeautifulSoup
url = "http://www.chicagoreader.com/chicago/BestOf?category=4053660&year=2013"
page = urllib2.urlopen(url)
soup_package = BeautifulSoup(page)
page.close()

#find everything in the div class="bestOfItem). This works.
all_categories = soup_package.findAll("div",class_="bestOfItem")
print(winner_category) #print out all winner categories to see if working

#grab just the text in a tag:
for match_categories in all_categories:
    winner_category = match_categories.a.string

#Move to csv file:
f = file("file.csv", 'a')
csv_writer = csv.writer(f)
csv_writer.writerow(winner_category)
print("Check your dropbox for file")

Answer 1

将#Move移动到csv文件：部分在For循环中。

此外，似乎你还在for循环中覆盖winner_category。采取其他变量可能是一个更好的主意。

像（未经测试的）应该有所帮助

#grab just the text in a tag:
f = file("file.csv", 'a')

for match_categories in all_categories:
    fwinner = match_categories.a.string

    #Move to csv file:
    csv_writer = csv.writer(f)
    csv_writer.writerow(fwinner)
    print("Check your dropbox for file")
f.close()

Answer 2

问题是writerow()期望迭代。在您的情况下，它接收一个字符串并将其拆分为单个字符。将每个值放入列表中。

此外，您需要在循环中执行此操作。

此外，您可以将urllib2.urlopen(url)直接传递给BeautifulSoup构造函数。

此外，您在处理文件时应使用with上下文管理器。

以下是修改后的代码：

import urllib2
import csv
from bs4 import BeautifulSoup


url = "http://www.chicagoreader.com/chicago/BestOf?category=4053660&year=2013"
soup_package = BeautifulSoup(urllib2.urlopen(url))
all_categories = soup_package.find_all("div", class_="bestOfItem")

with open("file.csv", 'w') as f:
    csv_writer = csv.writer(f)
    for match_categories in all_categories:
        value = match_categories.a.string
        if value:
            csv_writer.writerow([value.encode('utf-8')])

运行脚本后file.csv的内容是：

Best View From a Performance Space
Best Amateur Hip-Hop Dancer Who's Also a Professional Wrestler
Best Dance Venue in New Digs
Best Outré Dance
Best (and Most Vocal) Mime
Best Performance in a Fat Suit
Best Theatrical Use of Unruly Facial Hair
...

此外，我不确定您是否需要csv模块。

使用BeautifulSoup将抓取的数据移动到csv

2 个答案: