通过Beautiful Soup,我抓取了twitter数据。我可以获取数据,但是不能保存在csv文件中

时间:2018-08-30 19:11:03

标签: python-3.x twitter web-scraping beautifulsoup

我在Twitter上抓取了用户名,推文,回复,推文,但无法保存为CSV文件。

这是代码:

new Amazon.EC2.Model.Filter
{
    Name = "instance-id",
    Values = new List<string>
    {
        ""
    }
}


new Amazon.EC2.Model.Filter
{
    Name = "instance-id",
    Values = new List<string>
    {
        null
    }
}


new Amazon.EC2.Model.Filter
{
    Name = "instance-id",
    Values = null
}


new Amazon.EC2.Model.Filter
{
    Name = "instance-id",
    Values = new List<string>
    {
        "!*"
    }
}

我获取了数据,但是无法保存在CSV文件中。有人向我解释了如何将数据保存为CSV文件。

2 个答案:

答案 0 :(得分:2)

如您所见,您在tweets = soup.find_all("div", {"class":"js-stream-item"})上找到推文时只犯了一个小错误,却忘记传递应该像这样的tweets = soup.find_all("div", attrs={"class":"js-stream-item"})的自变量键名

这是一个可行的解决方案,但只能获取前20条推文

from urllib.request import urlopen
from bs4 import BeautifulSoup
file = "5_twitterBBC.csv"
f = open(file, "w")
Headers = "tweet_user, tweet_text,  replies,  retweets\n"
f.write(Headers)
url = "https://twitter.com/BBCWorld"
html = urlopen(url)
soup = BeautifulSoup(html, "html.parser")

# Gets the tweet
tweets = soup.find_all("li", attrs={"class":"js-stream-item"})

# Writes tweet fetched in file
for tweet in tweets:
    try:
        if tweet.find('p',{"class":'tweet-text'}):
            tweet_user = tweet.find('span',{"class":'username'}).text.strip()
            tweet_text = tweet.find('p',{"class":'tweet-text'}).text.encode('utf8').strip()
            replies = tweet.find('span',{"class":"ProfileTweet-actionCount"}).text.strip()
            retweets = tweet.find('span', {"class" : "ProfileTweet-action--retweet"}).text.strip()
            # String interpolation technique
            f.write(f'{tweet_user},/^{tweet_text}$/,{replies},{retweets}\n')
    except: AttributeError
f.close()

答案 1 :(得分:0)

filename = "output.csv"
f = open(filename, "w",encoding="utf-8")
headers = " tweet_user, tweet_text, replies, retweets \n"
f.write(headers)

***your code***

      ***loop****

     f.write(''.join(tweet_user + [","] + tweet_text + [","] + replies + [","] + retweets + [","] + ["\n"]) )
f.close()