我正在尝试收集关于纬度和经度的Twitter数据,但我碰巧得到了错误。
我正在努力避免推文计数限制以及抓取的时间限制。
代码:
import tweepy
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
import pandas as pd
import json
import csv
import sys
import time
reload(sys)
sys.setdefaultencoding('utf8')
ckey = 'XYZ'
csecret = 'XYZ'
atoken = 'XYZ'
asecret = 'XYZ'
OAUTH_KEYS = {'consumer_key':ckey, 'consumer_secret':csecret, 'access_token_key':atoken, 'access_token_secret':asecret}
auth = tweepy.OAuthHandler(OAUTH_KEYS['consumer_key'], OAUTH_KEYS['consumer_secret'])
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
if (not api):
print ("Can't Authenticate")
sys.exit(-1)
else:
print " Scraping data now" # Enter lat and long and radius in Kms q='hello'
cursor = tweepy.Cursor(api.search,geocode="55.0000,4.0000,1000km",since= '2016-06-27',until='2016-06-28',lang='en',count=100)
results=[]
for item in cursor.items(1000): # Remove the limit to 1000
results.append(item)
def toDataFrame(tweets):
# COnvert to data frame
DataSet = pd.DataFrame()
DataSet['tweetID'] = [tweet.id for tweet in tweets]
DataSet['tweetText'] = [tweet.text.encode('utf-8') for tweet in tweets]
DataSet['tweetRetweetCt'] = [tweet.retweet_count for tweet in tweets]
DataSet['tweetFavoriteCt'] = [tweet.favorite_count for tweet in tweets]
DataSet['tweetSource'] = [tweet.source for tweet in tweets]
DataSet['tweetCreated'] = [tweet.created_at for tweet in tweets]
DataSet['userID'] = [tweet.user.id for tweet in tweets]
DataSet['userScreen'] = [tweet.user.screen_name for tweet in tweets]
DataSet['userName'] = [tweet.user.name for tweet in tweets]
DataSet['userCreateDt'] = [tweet.user.created_at for tweet in tweets]
DataSet['userDesc'] = [tweet.user.description for tweet in tweets]
DataSet['userFollowerCt'] = [tweet.user.followers_count for tweet in tweets]
DataSet['userFriendsCt'] = [tweet.user.friends_count for tweet in tweets]
DataSet['userLocation'] = [tweet.user.location for tweet in tweets]
DataSet['userTimezone'] = [tweet.user.time_zone for tweet in tweets]
DataSet['Coordinates'] = [tweet.coordinates for tweet in tweets]
DataSet['GeoEnabled'] = [tweet.user.geo_enabled for tweet in tweets]
DataSet['Language'] = [tweet.user.lang for tweet in tweets]
tweets_place= []
#users_retweeted = []
for tweet in tweets:
if tweet.place:
tweets_place.append(tweet.place.full_name)
else:
tweets_place.append('null')
DataSet['TweetPlace'] = [i for i in tweets_place]
#DataSet['UserWhoRetweeted'] = [i for i in users_retweeted]
return DataSet
DataSet = toDataFrame(results)
DataSet.to_csv('Belgium_27.csv',index=False)
错误:
Traceback (most recent call last):
File "CS.py", line 23, in <module>
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
TypeError: __init__() got an unexpected keyword argument 'wait_on_rate_limit'
为了解决错误并收集推文,需要做出哪些更改?
编辑一个:tweepy
升级后,我收到以下警告,程序自动终止
Scraping data now
/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/util/ssl_.py:318: SNIMissingWarning: An HTTPS request has been made, but the SNI (Subject Name Indication) extension to TLS is not available on this platform. This may cause the server to present an incorrect TLS certificate, which can cause validation failures. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#snimissingwarning.
SNIMissingWarning
/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/util/ssl_.py:122: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. You can upgrade to a newer version of Python to solve this. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
InsecurePlatformWarning
编辑二:更改代码中写入语句的缩进。程序执行但只返回空CSV
。