我是python尝试构建我的第一个脚本的新手。我想废弃一个url列表并将其导出到csv文件中。
我的脚本执行得很好但是在打开csv文件时只写入几行数据。当我打印我正在尝试编写的列表(sharelist
和sharelist1
)时,打印已完成,而csv文件则没有。
以下是我的代码的一部分:
for url in urllist[10:1000]:
# query the website and return the html to the variable 'page'
try:
page = urllib2.urlopen(url)
except urllib2.HTTPError as e:
if e.getcode() == 404: # eheck the return code
continue
soup = BeautifulSoup(page, 'html.parser')
# Take out the <div> of name and get its value
name_box = soup.find(attrs={'class': 'nb-shares'})
if name_box is None:
continue
share = name_box.text.strip() # strip() is used to remove starting and trailing
# save the data in tuple
sharelist.append(url)
sharelist1.append(share)
# open a file for writing.
csv_out = open('mycsv.csv', 'wb')
# create the csv writer object.
mywriter = csv.writer(csv_out)
# writerow - one row of data at a time.
for row in zip(sharelist, sharelist1):
mywriter.writerow(row)
# always make sure that you close the file.
# otherwise you might find that it is empty.
csv_out.close()
我不确定我应该在这里分享我的代码的哪一部分。请告诉我它是否还不够!
答案 0 :(得分:3)
问题是每次运行循环时都要打开文件。这基本上会覆盖以前的文件。
# open a file for writing.
csv_out = open('mycsv.csv', 'wb')
# create the csv writer object.
mywriter = csv.writer(csv_out)
# writerow - one row of data at a time.
for row in zip(sharelist, sharelist1):
mywriter.writerow(row)
# always make sure that you close the file.
# otherwise you might find that it is empty.
csv_out.close()
在循环之前打开文件,或使用append选项打开它。
这是选项一(注意缩进):
# open a file for writing.
csv_out = open('mycsv.csv', 'wb')
# create the csv writer object.
mywriter = csv.writer(csv_out)
for url in urllist[10:1000]:
try:
page = urllib2.urlopen(url)
except urllib2.HTTPError as e:
if e.getcode() == 404: # eheck the return code
continue
soup = BeautifulSoup(page, 'html.parser')
name_box = soup.find(attrs={'class': 'nb-shares'})
if name_box is None:
continue
share = name_box.text.strip()
# save the data in tuple
sharelist.append(url)
sharelist1.append(share)
# writerow - one row of data at a time.
for row in zip(sharelist, sharelist1):
mywriter.writerow(row)
# always make sure that you close the file.
# otherwise you might find that it is empty.
csv_out.close()
这是选项2:
for url in urllist[10:1000]:
# query the website and return the html to the variable 'page'
try:
page = urllib2.urlopen(url)
except urllib2.HTTPError as e:
if e.getcode() == 404: # eheck the return code
continue
soup = BeautifulSoup(page, 'html.parser')
# Take out the <div> of name and get its value
name_box = soup.find(attrs={'class': 'nb-shares'})
if name_box is None:
continue
share = name_box.text.strip() # strip() is used to remove starting and trailing
# save the data in tuple
sharelist.append(url)
sharelist1.append(share)
# open a file for writing.
csv_out = open('mycsv.csv', 'ab')
# create the csv writer object.
mywriter = csv.writer(csv_out)
# writerow - one row of data at a time.
for row in zip(sharelist, sharelist1):
mywriter.writerow(row)
# always make sure that you close the file.
# otherwise you might find that it is empty.
csv_out.close()
答案 1 :(得分:1)
问题已经找到,文件的最佳解决方案是使用with
关键字,允许自动关闭文件:
with open('mycsv.csv', 'wb') as csv_out:
mywriter = csv.writer(csv_out)
for url in urllist[10:1000]:
try:
page = urllib2.urlopen(url)
except urllib2.HTTPError as e:
if e.getcode() == 404:
continue
soup = BeautifulSoup(page, 'html.parser')
name_box = soup.find(attrs={'class': 'nb-shares'})
if name_box is None:
continue
share = name_box.text.strip()
# save the data in tuple
sharelist.append(url)
sharelist1.append(share)
for row in zip(sharelist, sharelist1):
mywriter.writerow(row)
答案 2 :(得分:0)
使用上下文管理器打开要写入的文件,这样就不需要显式关闭文件了。
with open('mycsv.csv', 'w') as file_obj:
mywriter = csv.writer(file_obj)
for url in urllist[10:1000]:
try:
page = urllib2.urlopen(url)
except urllib2.HTTPError as e:
if e.getcode() == 404: # check the return code
continue
soup = BeautifulSoup(page, 'html.parser')
name_box = soup.find(attrs={'class': 'nb-shares'})
if name_box is None:
continue
share = name_box.text.strip()
# no need to use zip, and append in 2 lists as they're really expensive calls,
# and by the looks of it, I think it'll create duplicate rows in your file
mywriter.writerow((url, share))