Tweepy流过滤器按位置返回我位置过滤器之外的经过地理标记的推文

时间:2019-11-05 04:39:37

标签: python api twitter tweepy

我正在使用tmux在EC2实例上运行Twitter流。该流正在提取到Amazon RDS上的postgresql数据库中。我正在使用边界框位置过滤器。边界框很小(华盛顿特区有多个街区)。我只摄取具有Point几何形状的推文。通常,tweet的位置在D.C.区域中,但是它们在我的边界框之内和之外。有人可以向我解释为什么这些点不在我的边界框内吗?

这是7个小时以上的直播结果: Geotagged_Tweets

大多数位于华盛顿特区,但我的盒子里只有1-2个。有些在华盛顿特区以外。

这是我的代码:

cat << EOF > send-tweets.py
import tweepy
from textblob import TextBlob
from sqlalchemy.exc import ProgrammingError
import json
import psycopg2
import datetime



connection = psycopg2.connect(
    host = 'XXXX',
    port = XXXX,
    user = 'XXXX',
    password = 'XXXX',
    database='XXXX'
    )
db=connection.cursor()

db.execute("""CREATE TABLE TwitterTest(
            description text,
            loc text,
            text text,
            name text,
            followers bigint,
            id_str bigint Primary Key,
            retweets bigint,
            polarity text,
            subjectivity text,
            noun_1 text,
            Date_Time timestamp,
            coords GEOMETRY
            )""")




class StreamListener(tweepy.StreamListener):

    def on_status(self, status):
        if status.retweeted:
            return
        #if status.geo != "Point":
            #return

        description = status.user.description
        loc = status.user.location
        text = status.text
        name = status.user.screen_name
        user_created = status.user.created_at
        followers = status.user.followers_count
        id_str = status.id_str
        created = status.created_at
        retweets = status.retweet_count
        bg_color = status.user.profile_background_color
        blob = TextBlob(text)
        sent = blob.sentiment
        piss = blob.noun_phrases,
        geo = status.geo,
        date_time = status.created_at,
        coords = status.coordinates


        if geo is not None:
            geo = json.dumps(geo)

        if coords is not None:
            coord = json.dumps(coords)
            if "Point" in coord:
                uu = coords
                uu=str(uu["coordinates"])[1:-1]
                ii="POINT" + "(" + uu + ")"
                yy =ii.replace(",", "")



                try:
                    db.execute("INSERT INTO TwitterTest(description, loc, text, name, followers,id_str, retweets, polarity, subjectivity, noun_1, Date_Time, coords) VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,ST_GeomFromText(%s,4326))",(description,loc,text,name,followers,id_str,retweets,sent.polarity,sent.subjectivity,piss,date_time,yy))
                    connection.commit()


                except ProgrammingError as err:
                    print(err)

    def on_error(self, status_code):
        if status_code == 420:
            #returning False in on_data disconnects the stream
            return False

auth = tweepy.OAuthHandler("XXXX", "XXXX")
auth.set_access_token('XXXX', 'XXXX')
api = tweepy.API(auth)

stream_listener = StreamListener()
stream = tweepy.Stream(auth=api.auth, listener=stream_listener)
stream.filter(locations=[-77.047803,38.907873,-77.043030,38.911743])
EOF

当我使用地理编码方法时,我似乎会有不同的结果,因为所有推文都在搜索范围之内。

api.search(geocode="38.909770,-77.045202,0.15mi", result_type = "recent")

0 个答案:

没有答案