Question

我发现以下一段代码非常适合让我在Python Shell中查看twitter firehose的标准1％：

import sys
import tweepy

consumer_key=""
consumer_secret=""
access_key = ""
access_secret = "" 


auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)


class CustomStreamListener(tweepy.StreamListener):
    def on_status(self, status):
        print status.text

    def on_error(self, status_code):
        print >> sys.stderr, 'Encountered error with status code:', status_code
        return True # Don't kill the stream

    def on_timeout(self):
        print >> sys.stderr, 'Timeout...'
        return True # Don't kill the stream

sapi = tweepy.streaming.Stream(auth, CustomStreamListener())
sapi.filter(track=['manchester united'])

如何添加过滤器以仅解析特定位置的推文？我见过人们将GPS添加到其他推特相关的Python代码中，但我无法在Tweepy模块中找到任何特定的sapi。

有什么想法吗？

由于

Answer 1

流API不允许同时按位置和关键字进行过滤。

边界框不作为其他过滤器参数的过滤器。例如 track = twitter＆amp; locations = -122.75,36.8，-121.75,37.8将匹配任何包含的推文 Twitter（甚至是非地理推文）或来自旧金山地区。

来源：https://dev.twitter.com/docs/streaming-apis/parameters#locations

您可以做的是向流式API询问关键字或找到的推文，然后通过查看每条推文来过滤您应用中的结果流。

如果您修改代码如下，您将捕获英国的推文，然后这些推文被过滤，只显示那些包含＆＃34;曼彻斯特联合＆＃34;

import sys
import tweepy

consumer_key=""
consumer_secret=""
access_key=""
access_secret=""

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)


class CustomStreamListener(tweepy.StreamListener):
    def on_status(self, status):
        if 'manchester united' in status.text.lower():
            print status.text

    def on_error(self, status_code):
        print >> sys.stderr, 'Encountered error with status code:', status_code
        return True # Don't kill the stream

    def on_timeout(self):
        print >> sys.stderr, 'Timeout...'
        return True # Don't kill the stream

sapi = tweepy.streaming.Stream(auth, CustomStreamListener())    
sapi.filter(locations=[-6.38,49.87,1.77,55.81])

Answer 2

胡安给出了正确的答案。我只是用这个过滤德国：

# Bounding boxes for geolocations
# Online-Tool to create boxes (c+p as raw CSV): http://boundingbox.klokantech.com/
GEOBOX_WORLD = [-180,-90,180,90]
GEOBOX_GERMANY = [5.0770049095, 47.2982950435, 15.0403900146, 54.9039819757]

stream.filter(locations=GEOBOX_GERMANY)

这是一个相当粗糙的盒子，包括其他一些国家的部分。如果你想要更精细的颗粒，你可以组合多个盒子来填写你需要的位置。

应该注意的是，如果按地理标记进行过滤，会限制推文的数量。这是来自我的测试数据库的大约500万条推文（查询应该返回实际包含地理位置的推文的％年龄）：

> db.tweets.find({coordinates:{$ne:null}}).count() / db.tweets.count() 0.016668392651547598

因此，我的1％流样本中只有1.67％包含地理标记。然而，还有其他方法来确定用户的位置： http://arxiv.org/ftp/arxiv/papers/1403/1403.2345.pdf

Answer 3

如果在流中写入推文，则不能在流传输时对其进行过滤，但是可以在输出阶段对其进行过滤。

Answer 4

sapi.filter（track = [＆＃39; manchester united＆＃39;]，locations = [＆＃39; GPS Coordinates＆＃39;]）

如何向tweepy模块添加位置过滤器

4 个答案: