尝试创建一个可以对twitter进行数据处理的python脚本,但我没有好运!我不知道自己做错了什么
from pymongo import MongoClient
import json
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import datetime
# Auth Variables
consumer_key = "INSERT_KEY_HERE"
consumer_key_secret = "INSERT_KEY_HERE"
access_token = "INSERT_KEY_HERE"
access_token_secret = "INSERT_KEY_HERE"
# MongoDB connection info
connection = MongoClient('localhost', 27017)
db = connection.TwitterStream
db.tweets.ensure_index("id", unique=True, dropDups=True)
collection = db.tweets
# Key words to be tracked, (hashtags)
keyword_list = ['#MorningAfter', '#Clinton', '#Trump']
class StdOutListener(StreamListener):
def on_data(self, data):
# Load the Tweet into the variable "t"
t = json.loads(data)
# Pull important data from the tweet to store in the database.
tweet_id = t['id_str'] # The Tweet ID from Twitter in string format
text = t['text'] # The entire body of the Tweet
hashtags = t['entities']['hashtags'] # Any hashtags used in the Tweet
time_stamp = t['created_at'] # The timestamp of when the Tweet was created
language = t['lang'] # The language of the Tweet
# Convert the timestamp string given by Twitter to a date object called "created"
created = datetime.datetime.strptime(time_stamp, '%a %b %d %H:%M:%S +0000 %Y')
# Load all of the extracted Tweet data into the variable "tweet" that will be stored into the database
tweet = {'id': tweet_id, 'text': text, 'hashtags': hashtags, 'language': language, 'created': created}
# Save the refined Tweet data to MongoDB
collection.insert(tweet)
print(tweet_id + "\n")
return True
# Prints the reason for an error to your console
def on_error(self, status):
print(status)
l = StdOutListener(api=tweepy.API(wait_on_rate_limit=True))
auth = OAuthHandler(consumer_key, consumer_key_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, listener=l)
stream.filter(track=keyword_list)
这是我到目前为止的脚本。我尝试过一些谷歌搜索,我已经将我拥有的内容与他们拥有的内容进行了比较,但无法找到问题的根源。它运行并连接到MongoDB,我创建了正确的数据库,但没有任何东西放在数据库中。我有一些调试代码,它打印推文ID,但只是在大约5-10秒的时间间隔内反复打印401。我尝试了一些基本的例子,我在google搜索我想做的事情时仍然没有发生任何事情。我认为这可能是数据库连接的问题?这里是正在运行的数据库的一些图像。 非常感谢任何想法,谢谢!
答案 0 :(得分:0)
我终于明白了! 401的打印是关键,它是一个身份验证错误。我不得不将我的系统时钟连接到互联网,并重置我的系统时钟。