我有一个脚本通过pymongo将消息传入我当地的mongodb:
import json
import pymongo
import tweepy
consumer_key = ""
consumer_secret = ""
access_key = ""
access_secret = ""
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
class CustomStreamListener(tweepy.StreamListener):
def __init__(self, api):
self.api = api
super(tweepy.StreamListener, self).__init__()
self.db = pymongo.MongoClient().test
def on_data(self, tweet):
self.db.tweets.insert(json.loads(tweet))
def on_error(self, status_code):
return True # Don't kill the stream
def on_timeout(self):
return True # Don't kill the stream
sapi = tweepy.streaming.Stream(auth, CustomStreamListener(api))
sapi.filter(locations=[-74, 40, -73, 41])
目前,我收到完整的推文,这比我实际需要的信息更多。如何更改现有脚本以便仅使用以下信息:
i)Hashtag ii)UserID iii)PlaceID iv)时间戳?
答案 0 :(得分:1)
在on_data
中,解析json以获取您感兴趣的数据并保存它们:
def on_data(self, tweet):
tweet_parsed = json.loads(tweet)
if 'created_at' in tweet_parsed:
hashtags = tweet_parsed['entities']['hashtags']
for hashtag in hashtags:
# Now get the hashtags.
hashtag_text = hashtag['text']
# Now get the user id.
user_id = tweet_parsed['user']['id']
# Now get the longitude.
longitude = tweet_parsed['coordinates']['coordinates'][0]
# Now get the latitude.
latiitude = tweet_parsed['coordinates']['coordinates'][1]
# Now get the timestamp.
timestamp = tweet_parsed['created_at']
答案 1 :(得分:0)
在on_data
中:不是将原始tweet
对象传递给.insert
,而是创建一个只包含所需字段的新本地对象,并从推文对象中复制值。 / p>