将网络抓取的数据保存到txt文件

时间:2019-02-04 15:51:11

标签: python-3.x web-scraping

我试图将已经从《纽约时报》网页上抓取的数据保存到txt文件中。

import urllib.request
from bs4 import BeautifulSoup


# URL
html_page = 'https://www.nytimes.com/'

page = urllib.request.urlopen(html_page)

soup = BeautifulSoup(page, "html.parser")

title_box = soup.findAll("h2", class_= "css-bzeb53 esl82me2")
print(title_box)

# Extract titles from list 
titles = []
for occurence in title_box:
    titles.append(occurence.text.strip())

print(titles)

目前为止工作正常,但是我无法创建/保存数据到txt文件。

# Save the Headlines
filename = '/home/stephan/Documents/NYHeads.txt'
with open(filename, 'w') as file_object:
    file_object.write(titles)

1 个答案:

答案 0 :(得分:0)

问题是当您尝试写入文件时,它必须是字符串。程序中的titles是一个列表。您需要将titles转换为字符串。这应该起作用:

filename = '/home/stephan/Documents/NYHeads.txt'
with open(filename, 'w') as file_object:
    file_object.write(str(titles))