Question

我正在尝试使用python使用python从特定区域提取推文位置+将其写入csv文件。我对python并不是很了解，但是我可以设法将以下的sript组合在一起：

import json
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener

#Enter Twitter API Key information
consumer_key = 'cons_key'
consumer_secret = 'cons_secret'
access_token = 'acc_token'
access_secret = 'acc-secret'

file = open("C:\Python27\Output2.csv", "w")
file.write("X,Y\n")

data_list = []
count = 0

class listener(StreamListener):

    def on_data(self, data):
        global count

        #How many tweets you want to find, could change to time based
        if count <= 100:
            json_data = json.loads(data)

            coords = json_data["coordinates"]
            if coords is not None:
               print coords["coordinates"]
               lon = coords["coordinates"][0]
               lat = coords["coordinates"][1]

               data_list.append(json_data)

               file.write(str(lon) + ",")
               file.write(str(lat) + "\n")

               count += 1
            return True
        else:
            file.close()
            return False

    def on_error(self, status):
        print status

auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
twitterStream = Stream(auth, listener())
#What you want to search for here
twitterStream.filter(locations=[11.01,47.85,12.09,48.43])

问题是，它非常缓慢地提取坐标（例如每30分钟10个条目）。有没有办法让这更快？

如何为每条推文添加时间戳？有没有办法确保检索特定区域可能的所有推文（我猜最大值是过去一周的所有推文）？

非常感谢！

Answer 1

Twitter的标准流媒体API提供了所有发布的推文的1％样本。此外，很少有推文添加了位置数据。所以，我并不感到惊讶的是，你只需要在30分钟的时间内为一个特定的边界框获得少量推文。提高数量的唯一方法是为企业PowerTrack API付费。

推文全部包含created_at值，这是您要记录的时间戳。

如何使用Python / Tweepy在特定的边界框中获取Twitter推文的Coordinates和TimeStamp？

1 个答案: