写了一个简单的网页抓取脚本来解析某个新闻频道的推文。所以我希望它解析这些推文并将其写在.csv
文件中。该脚本似乎工作正常,但我无法弄清楚如何写它" tweets"和" news_link"在他们各自的标题下!
我错过了什么?
import urllib.request
import bs4
import csv
source = urllib.request.urlopen("https://twitter.com/abpnewstv").read()
soup = bs4.BeautifulSoup(source, "lxml")
with open("twitter news.csv", "w", newline="") as csvfile:
news_writer = csv.writer(csvfile, delimiter=",")
news_writer.writerow(["tweet", "news_link"])
for content in soup.find_all("div", {"class": "js-tweet-text-container"}):
tweet = content.p.text.split(".")[0]
print(tweet)
try:
news_link = content.a.text
except AttributeError:
pass
print(news_link + "\n")
答案 0 :(得分:1)
除了标题之外,您还没有向csv文件写任何内容 - 您只是打印到stdout。您需要在for
块中缩进with
循环并使用news_writer.writerow([tweet, news_link])
而不是打印。
答案 1 :(得分:0)
您需要再执行两个步骤
写下每条推文:
news_writer.writerow([tweet, news_link])
确保在open csv文件的上下文管理器下完成此操作。
import urllib.request
import bs4
import csv
source = urllib.request.urlopen("https://twitter.com/abpnewstv").read()
soup = bs4.BeautifulSoup(source, "lxml")
with open("twitter news.csv", "w", newline="") as csvfile:
news_writer = csv.writer(csvfile, delimiter=",")
news_writer.writerow(["tweet", "news_link"])
for content in soup.find_all("div", {"class": "js-tweet-text-container"}):
tweet = content.p.text.split(".")[0]
print(tweet)
try:
news_link = content.a.text
except AttributeError:
pass
print(news_link + "\n")
news_writer.writerow([tweet, news_link])