将您使用BeautifulSoup抓取的数据移动到CSV文件中似乎至关重要。我接近成功,但不知何故,CSV文件中的每一列都是来自刮取信息的一个字母,并且它只移动了最后一项刮擦。
这是我的代码:
import urllib2
import csv
from bs4 import BeautifulSoup
url = "http://www.chicagoreader.com/chicago/BestOf?category=4053660&year=2013"
page = urllib2.urlopen(url)
soup_package = BeautifulSoup(page)
page.close()
#find everything in the div class="bestOfItem). This works.
all_categories = soup_package.findAll("div",class_="bestOfItem")
print(winner_category) #print out all winner categories to see if working
#grab just the text in a tag:
for match_categories in all_categories:
winner_category = match_categories.a.string
#Move to csv file:
f = file("file.csv", 'a')
csv_writer = csv.writer(f)
csv_writer.writerow(winner_category)
print("Check your dropbox for file")
答案 0 :(得分:0)
将#Move移动到csv文件:部分在For循环中。
此外,似乎你还在for循环中覆盖winner_category。采取其他变量可能是一个更好的主意。
像(未经测试的)应该有所帮助
#grab just the text in a tag:
f = file("file.csv", 'a')
for match_categories in all_categories:
fwinner = match_categories.a.string
#Move to csv file:
csv_writer = csv.writer(f)
csv_writer.writerow(fwinner)
print("Check your dropbox for file")
f.close()
答案 1 :(得分:0)
问题是writerow()
期望迭代。在您的情况下,它接收一个字符串并将其拆分为单个字符。将每个值放入列表中。
此外,您需要在循环中执行此操作。
此外,您可以将urllib2.urlopen(url)
直接传递给BeautifulSoup
构造函数。
此外,您在处理文件时应使用with
上下文管理器。
以下是修改后的代码:
import urllib2
import csv
from bs4 import BeautifulSoup
url = "http://www.chicagoreader.com/chicago/BestOf?category=4053660&year=2013"
soup_package = BeautifulSoup(urllib2.urlopen(url))
all_categories = soup_package.find_all("div", class_="bestOfItem")
with open("file.csv", 'w') as f:
csv_writer = csv.writer(f)
for match_categories in all_categories:
value = match_categories.a.string
if value:
csv_writer.writerow([value.encode('utf-8')])
运行脚本后file.csv
的内容是:
Best View From a Performance Space
Best Amateur Hip-Hop Dancer Who's Also a Professional Wrestler
Best Dance Venue in New Digs
Best Outré Dance
Best (and Most Vocal) Mime
Best Performance in a Fat Suit
Best Theatrical Use of Unruly Facial Hair
...
此外,我不确定您是否需要csv
模块。