BeautifulSoup抓取的数据会打印到屏幕上,但不会保存

时间:2018-07-11 08:30:34

标签: python beautifulsoup

我的代码如下:

{{1}}

所有数据都打印到屏幕上,但是只有一部分到达了我的CSV文件。我是新手,因此不胜感激。

我想知道是否像本文(Pandas prints to screen corrently but saves only some data to csv)一样试图一次捕获太多数据,但是解释超出了我的技术水平。

1 个答案:

答案 0 :(得分:0)

artist_csv_file = open('artist_data.csv', 'w')

由于“ w”,此行会在每个循环中覆盖文件。尝试使用“ a”进行附加,然后它将结果evrey循环附加在文件末尾。

您可能应该在循环之前初始化文件以添加列标题,否则它可能会在每个循环中写一行新的标题。

[...]
urls = csv.reader(csvf)

#create/clean artist_data.csv and insert column headers 
with open ('artist_data.csv', 'w+') as artist_csv_file:
    csv_writer = csv.writer(artist_csv_file)
    csv_writer.writerow(['date_text', 'artist', 'track', 'url'])

# this opens the file once for writing the column headers, this first opening with 'w+' makes sure previous content gets cleaned
# if the file is always empty when you run the program you could all in one context manager with 'a+'. 


#now open the csv in append mode and do the scraping
with open('artist_data.csv', 'a') as artist_csv_file:

    csv_writer = csv.writer(artist_csv_file)

    for url in urls:

        [...]
        # these lines should be removed from the loop-body
        artist_csv_file = open('artist_data.csv', 'w')
        csv_writer = csv.writer(artist_csv_file)
        csv_writer.writerow(['date_text', 'artist', 'track', 'url'])
        [...]


# no close statement at the end needed. 

希望有帮助。玩得开心!