Question

我正在创建一个csv文件，该文件收集了从网站上删除的几篇文章。通过从另一个文件中包含的URL中抓取文本来获得这些文章。我想将CSV文件作为列表，其中每篇文章对应于列表的元素。

我现在使用的代码是：

import csv
import requests
from bf4 import BeautifulSoup


with open('Training_news.csv', newline='') as file:
    reader= csv.reader (file, delimiter=' ')
    for row in reader:
        for url in row:
            r=requests.get(url)
            r.encoding = "ISO-8859-1"
            soup = BeautifulSoup(r.content, 'lxml')
            text = soup.find_all(("p",{"class": "story-body-text story-content"}))
with open('Training_News_5.csv', 'w', newline='') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=' ')
    spamwriter.writerow(text)

但是，创建的CSV文件给了我这个：

 <p>Advertisement</p>, <p class="byline-dateline"><span class="byline" itemprop.......
 <p class="feedback-message">We’re interested in your feedback on this page. <strong>Tell us what you think.</strong></p>, <p class="user-action"><a href="http://www.nytimes.com/">Go to Home Page »</a></p>

存储的文章只有50个，并且不允许我单独选择每篇文章。

CSV |存储为列表元素的文本

0 个答案: