我已经将Tweepy与Twitter流API一起使用,汇总了一个python脚本,该脚本使用某个#hashtag收集推文并将其保存到数据库中。我在自己的Ubuntu服务器上托管它,我通过ssh从我的电脑登录。
我使用标签通过推特自己测试代码。当我启动脚本时,它将流式传输前一个或两个推文,然后在下一个推文上,我收到以下错误:
Traceback (most recent call last):
File "stream.py", line 69, in <module>
stream.filter(track=settings.TRACK_TERMS)
File "build/bdist.linux-x86_64/egg/tweepy/streaming.py", line 450, in filter
File "build/bdist.linux-x86_64/egg/tweepy/streaming.py", line 364, in _start
File "build/bdist.linux-x86_64/egg/tweepy/streaming.py", line 297, in _run
File "build/bdist.linux-x86_64/egg/tweepy/streaming.py", line 266, in _run
File "build/bdist.linux-x86_64/egg/tweepy/streaming.py", line 323, in _read_loop
tweepy.error.TweepError: Expecting length, unexpected value found
从我的测试来看,这似乎与推文的长度,主题标签的位置或推文中使用的任何特殊字符无关。为什么脚本开始工作,处理一两条推文并将它们添加到数据库中,然后在下一条推文中突然想出这个错误?
下面是我的stream.py代码(我有一个单独的文件&#34; settings.py&#34;这是指,包含我的track_term #hashtag和api密钥):
import settings
import dataset
import tweepy
from textblob import TextBlob
from sqlalchemy.exc import ProgrammingError
import json
db = dataset.connect(settings.CONNECTION_STRING)
class StreamListener(tweepy.StreamListener):
def on_status(self, status):
print(status.text)
description = status.user.description
text = status.text
name = status.user.screen_name
followers = status.user.followers_count
created = status.created_at
retweets = status.retweet_count
id_str = status.id_str
#creating and storing in database
table = db[settings.TABLE_NAME]
try:
table.insert(dict(
user_description=description,
text=text,
user_name=name,
user_followers=followers,
id_str=id_str,
created=created,
retweet_count=retweets,
))
except ProgrammingError as err:
print(err)
def on_error(self, status_code):
if status_code == 420:
#returning False in on_data disconnects the stream
return False
auth = tweepy.OAuthHandler(settings.TWITTER_APP_KEY, settings.TWITTER_APP_SECRET)
auth.set_access_token(settings.TWITTER_KEY, settings.TWITTER_SECRET)
api = tweepy.API(auth)
stream_listener = StreamListener()
stream = tweepy.Stream(auth=api.auth, listener=stream_listener)
stream.filter(track=settings.TRACK_TERMS)