我正在使用tmux在EC2实例上运行Twitter流。该流正在提取到Amazon RDS上的postgresql数据库中。我正在使用边界框位置过滤器。边界框很小(华盛顿特区有多个街区)。我只摄取具有Point几何形状的推文。通常,tweet的位置在D.C.区域中,但是它们在我的边界框之内和之外。有人可以向我解释为什么这些点不在我的边界框内吗?
这是7个小时以上的直播结果: Geotagged_Tweets
大多数位于华盛顿特区,但我的盒子里只有1-2个。有些在华盛顿特区以外。
这是我的代码:
cat << EOF > send-tweets.py
import tweepy
from textblob import TextBlob
from sqlalchemy.exc import ProgrammingError
import json
import psycopg2
import datetime
connection = psycopg2.connect(
host = 'XXXX',
port = XXXX,
user = 'XXXX',
password = 'XXXX',
database='XXXX'
)
db=connection.cursor()
db.execute("""CREATE TABLE TwitterTest(
description text,
loc text,
text text,
name text,
followers bigint,
id_str bigint Primary Key,
retweets bigint,
polarity text,
subjectivity text,
noun_1 text,
Date_Time timestamp,
coords GEOMETRY
)""")
class StreamListener(tweepy.StreamListener):
def on_status(self, status):
if status.retweeted:
return
#if status.geo != "Point":
#return
description = status.user.description
loc = status.user.location
text = status.text
name = status.user.screen_name
user_created = status.user.created_at
followers = status.user.followers_count
id_str = status.id_str
created = status.created_at
retweets = status.retweet_count
bg_color = status.user.profile_background_color
blob = TextBlob(text)
sent = blob.sentiment
piss = blob.noun_phrases,
geo = status.geo,
date_time = status.created_at,
coords = status.coordinates
if geo is not None:
geo = json.dumps(geo)
if coords is not None:
coord = json.dumps(coords)
if "Point" in coord:
uu = coords
uu=str(uu["coordinates"])[1:-1]
ii="POINT" + "(" + uu + ")"
yy =ii.replace(",", "")
try:
db.execute("INSERT INTO TwitterTest(description, loc, text, name, followers,id_str, retweets, polarity, subjectivity, noun_1, Date_Time, coords) VALUES (%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,ST_GeomFromText(%s,4326))",(description,loc,text,name,followers,id_str,retweets,sent.polarity,sent.subjectivity,piss,date_time,yy))
connection.commit()
except ProgrammingError as err:
print(err)
def on_error(self, status_code):
if status_code == 420:
#returning False in on_data disconnects the stream
return False
auth = tweepy.OAuthHandler("XXXX", "XXXX")
auth.set_access_token('XXXX', 'XXXX')
api = tweepy.API(auth)
stream_listener = StreamListener()
stream = tweepy.Stream(auth=api.auth, listener=stream_listener)
stream.filter(locations=[-77.047803,38.907873,-77.043030,38.911743])
EOF
当我使用地理编码方法时,我似乎会有不同的结果,因为所有推文都在搜索范围之内。
api.search(geocode="38.909770,-77.045202,0.15mi", result_type = "recent")