Python:如何搜索推文并存储在数据库中?

时间:2016-08-30 17:35:41

标签: python twitter

我有一个很好的Python脚本,目前打印出来自给定用户名的过去200条推文。

但是,我想修改它,以便收集包含特定主题标签(来自任何用户名)的过去200条推文,然后我想将这些结果存储在数据库中。

有人可以就如何修改下面的代码提出建议吗?

import sys
import operator
import requests
import json
import twitter

twitter_consumer_key = 'XXXX'
twitter_consumer_secret = 'XXXX'
twitter_access_token = 'XXXX'
twitter_access_secret = 'XXXX'

twitter_api = twitter.Api(consumer_key=twitter_consumer_key, consumer_secret=twitter_consumer_secret, access_token_key=twitter_access_token, access_token_secret=twitter_access_secret)

statuses = twitter_api.GetUserTimeline(screen_name=handle, count=200, include_rts=False)

for status in statuses:
  if (status.lang == 'en'):
    print status

3 个答案:

答案 0 :(得分:0)

不熟悉Twitter软件包,但这可能是您可以使用的建议。取决于您希望如何保存推文,您可以替换"打印状态"以你想要的方式。 但是,这只允许您过滤200条推文,而不是获取包含特定主题标签的200条推文。

ruby 2.3.1p112

答案 1 :(得分:0)

我附上了一个java代码,该代码将打印出100条推文,其中包括'#engineeringproblems' hashtag(来自任何用户)。你需要添加twitter API' twitter4J'在图书馆里。

API下载链接 - http://twitter4j.org/en/index.html#download

Java源代码:

public static void main(String[] args) {

    ConfigurationBuilder cb = new ConfigurationBuilder();
    cb.setDebugEnabled(true)
     .setOAuthConsumerKey("xxxx")
     .setOAuthConsumerSecret("xxxx")
     .setOAuthAccessToken("xxxx")
     .setOAuthAccessTokenSecret("xxxx");

    Twitter twitter = new TwitterFactory(cb.build()).getInstance();
    Query query = new Query("#engineeringproblems ");
    int numberOfTweets = 100;
    long lastID = Long.MAX_VALUE;
    ArrayList<Status> tweets = new ArrayList<Status>();

    while (tweets.size() < numberOfTweets) {
        if (numberOfTweets - tweets.size() > 100) {
            query.setCount(100);
        } else {
            query.setCount(numberOfTweets - tweets.size());
        }
        try {
            QueryResult result = twitter.search(query);
            tweets.addAll(result.getTweets());
            System.out.println("Gathered " + tweets.size() + " tweets" + "\n");
            for (Status t : tweets) {
                if (t.getId() < lastID) {
                    lastID = t.getId();
                }
            }

        } catch (TwitterException te) {
            System.out.println("Couldn't connect: " + te);
        };
        query.setMaxId(lastID - 1);
    }
    for (int i = 0; i < tweets.size(); i++) {
        Status t = (Status) tweets.get(i);


        String user = t.getUser().getScreenName();
        String msg = t.getText();

        System.out.println(i + " USER: " + user + " wrote: " + msg + "\n");
    }
}

答案 2 :(得分:0)

很抱歉,但我一直在寻找Python解决方案,我相信我终于找到了并成功测试了它。代码如下。仍在寻找一种方法来修改脚本以将每行输入到SQL数据库中,但我希望我能在其他地方找到它。

pip install TwitterSearch

from TwitterSearch import *
try:
    tso = TwitterSearchOrder() # create a TwitterSearchOrder object
    tso.set_keywords(['Guttenberg', 'Doktorarbeit']) # let's define all words we would like to have a look for
    tso.set_language('de') # we want to see German tweets only
    tso.set_include_entities(False) # and don't give us all those entity information

    # it's about time to create a TwitterSearch object with our secret tokens
    ts = TwitterSearch(
        consumer_key = 'aaabbb',
        consumer_secret = 'cccddd',
        access_token = '111222',
        access_token_secret = '333444'
     )

     # this is where the fun actually starts :)
    for tweet in ts.search_tweets_iterable(tso):
        print( '@%s tweeted: %s' % ( tweet['user']['screen_name'], tweet['text'] ) )

except TwitterSearchException as e: # take care of all those ugly errors if there are some
    print(e)