我正在创建一个csv文件,该文件收集了从网站上删除的几篇文章。通过从另一个文件中包含的URL中抓取文本来获得这些文章。 我想将CSV文件作为列表,其中每篇文章对应于列表的元素。
我现在使用的代码是:
import csv
import requests
from bf4 import BeautifulSoup
with open('Training_news.csv', newline='') as file:
reader= csv.reader (file, delimiter=' ')
for row in reader:
for url in row:
r=requests.get(url)
r.encoding = "ISO-8859-1"
soup = BeautifulSoup(r.content, 'lxml')
text = soup.find_all(("p",{"class": "story-body-text story-content"}))
with open('Training_News_5.csv', 'w', newline='') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=' ')
spamwriter.writerow(text)
但是,创建的CSV文件给了我这个:
<p>Advertisement</p>, <p class="byline-dateline"><span class="byline" itemprop.......
<p class="feedback-message">We’re interested in your feedback on this page. <strong>Tell us what you think.</strong></p>, <p class="user-action"><a href="http://www.nytimes.com/">Go to Home Page »</a></p>
存储的文章只有50个,并且不允许我单独选择每篇文章。