Web Scraper并没有填写.csv文件

时间:2018-01-14 18:10:21

标签: python python-3.x csv twitter

写了一个简单的网页抓取脚本来解析某个新闻频道的推文。所以我希望它解析这些推文并将其写在.csv文件中。该脚本似乎工作正常,但我无法弄清楚如何写它" tweets"和" news_link"在他们各自的标题下!

我错过了什么?

代码:

import urllib.request
import bs4
import csv

source = urllib.request.urlopen("https://twitter.com/abpnewstv").read()
soup = bs4.BeautifulSoup(source, "lxml")

with open("twitter news.csv", "w", newline="") as csvfile:
    news_writer = csv.writer(csvfile, delimiter=",")
    news_writer.writerow(["tweet", "news_link"])

for content in soup.find_all("div", {"class": "js-tweet-text-container"}):
    tweet = content.p.text.split(".")[0]
    print(tweet)
    try:
        news_link = content.a.text
    except AttributeError:
        pass

    print(news_link + "\n")

2 个答案:

答案 0 :(得分:1)

除了标题之外,您还没有向csv文件写任何内容 - 您只是打印到stdout。您需要在for块中缩进with循环并使用news_writer.writerow([tweet, news_link])而不是打印。

答案 1 :(得分:0)

您需要再执行两个步骤

  1. 写下每条推文:

    news_writer.writerow([tweet, news_link])
    
  2. 确保在open csv文件的上下文管理器下完成此操作。

  3. 完整清单:

    import urllib.request
    import bs4
    import csv
    
    source = urllib.request.urlopen("https://twitter.com/abpnewstv").read()
    soup = bs4.BeautifulSoup(source, "lxml")
    
    with open("twitter news.csv", "w", newline="") as csvfile:
        news_writer = csv.writer(csvfile, delimiter=",")
        news_writer.writerow(["tweet", "news_link"])
    
        for content in soup.find_all("div", {"class": "js-tweet-text-container"}):
            tweet = content.p.text.split(".")[0]
            print(tweet)
            try:
                news_link = content.a.text
            except AttributeError:
                pass
    
            print(news_link + "\n")
    
            news_writer.writerow([tweet, news_link])