我试图将已经从《纽约时报》网页上抓取的数据保存到txt文件中。
import urllib.request
from bs4 import BeautifulSoup
# URL
html_page = 'https://www.nytimes.com/'
page = urllib.request.urlopen(html_page)
soup = BeautifulSoup(page, "html.parser")
title_box = soup.findAll("h2", class_= "css-bzeb53 esl82me2")
print(title_box)
# Extract titles from list
titles = []
for occurence in title_box:
titles.append(occurence.text.strip())
print(titles)
目前为止工作正常,但是我无法创建/保存数据到txt文件。
# Save the Headlines
filename = '/home/stephan/Documents/NYHeads.txt'
with open(filename, 'w') as file_object:
file_object.write(titles)
答案 0 :(得分:0)
问题是当您尝试写入文件时,它必须是字符串。程序中的titles
是一个列表。您需要将titles
转换为字符串。这应该起作用:
filename = '/home/stephan/Documents/NYHeads.txt'
with open(filename, 'w') as file_object:
file_object.write(str(titles))