我被分配了一个数据库项目,来自Twitter流的数据处理,我需要使用大约20gb的数据集。所以我从Python下载了twitter库并编写了以下代码:
# Import the necessary package to process data in JSON format
try:
import json
except ImportError:
import simplejson as json
from twitter import Twitter,OAuth, TwitterHTTPError, TwitterStream
ACCESS_TOKEN = "<my_token>"
ACCESS_SECRET = "<my_secret>"
CONSUMER_KEY = "<my_key>"
CONSUMER_SECRET = "<my_secret>"
oauth = OAuth(ACCESS_TOKEN, ACCESS_SECRET, CONSUMER_KEY,
CONSUMER_SECRET)
twitter_stream = TwitterStream(auth=oauth)
iterator = twitter_stream.statuses.sample().
for tweet in iterator:
print json.dumps(tweet)
但是如果获取数据后仅接收725kb,程序就会终止。我该怎么做才能从twitter流中获取20GB的数据?