我知道有关此问题的类似问题,但我正在使用的项目是使用Tweepy for Python,所以它更具体一些。
我从可口可乐和百事可乐的粉丝中收集了一千个用户ID,然后搜索每个用户的最新20个状态以收集使用的主题标签。
我使用的是Tweepy followers_ids和user_timeline API,但我一直在Twitter上收到401。如果我将用户ID的数量设置为仅搜索10而不是1000,我有时会得到我想要的结果,但即便如此,我有时也会获得401。所以它有效.... 有点。它似乎是导致这些错误的大集合,我不知道如何绕过它们。
我知道Twitter对通话有限制,但如果我能够即时获取1000个用户ID,为什么我无法获取状态?我意识到我试图获得20,000种状态,但我已经尝试过只有100 * 20甚至50 * 20但仍然可以获得401。我已经多次重置我的系统时钟,但只能偶尔使用10 * 20设置。我希望那里的人可能比我到目前为止有更好,更有效的方法。我是Twitter API的新手,也是Python的新手,所以希望它只是我。
以下是代码:
import tweepy
import pandas as pd
consumer_key = 'REDACTED'
consumer_secret = 'REDACTED'
access_token = 'REDACTED'
access_token_secret = 'REDACTED'
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.secure = True
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
pepsiUsers = []
cokeUsers = []
cur_pepsiUsers = tweepy.Cursor(api.followers_ids, screen_name='pepsi')
cur_cokeUsers = tweepy.Cursor(api.followers_ids, screen_name='CocaCola')
for user in cur_pepsiUsers.items(1000):
pepsiUsers.append({ 'userId': user, 'hTags': [], 'favSoda': 'Pepsi' })
for status in tweepy.Cursor(api.user_timeline, user).items(20):
status = status._json
hashtags = status['entities']['hashtags']
index = len(pepsiUsers) - 1
if len(hashtags) > 1:
for ht in hashtags:
pepsiUsers[index]['hTags'].append(ht['text'])
for user in cur_cokeUsers.items(1000):
cokeUsers.append({ 'userId': user, 'hTags': [], 'favSoda': 'Coke' })
for status in tweepy.Cursor(api.user_timeline, user).items(20):
status = status._json
hashtags = status['entities']['hashtags']
index = len(cokeUsers) - 1
if len(hashtags) > 1:
for ht in hashtags:
cokeUsers[index]['hTags'].append(ht['text'])
"""create a master list of coke and pepsi users to write to CSV"""
mergedList = cokeUsers + pepsiUsers
"""here we'll turn empty hashtag lists into blanks and turn all hashtags for each user into a single string
for easier searching with R later"""
for i in mergedList:
if len(i['hTags']) == 0:
i['hTags'] = ''
i['hTags'] = ''.join(i['hTags'])
list_df = pd.DataFrame(mergedList, columns=['userId', 'favSoda', 'hTags'])
list_df.to_csv('test.csv', index=False)
这是我在尝试运行运行api.user_timeline代码的块时遇到的错误
---------------------------------------------------------------------------
TweepError Traceback (most recent call last)
<ipython-input-134-a7658ed899f3> in <module>()
3 for user in cur_pepsiUsers.items(1000):
4 pepsiUsers.append({ 'userId': user, 'hTags': [], 'favSoda': 'Pepsi' })
----> 5 for status in tweepy.Cursor(api.user_timeline, user).items(20):
6 status = status._json
7 hashtags = status['entities']['hashtags']
/Users/.../anaconda/lib/python3.5/site-packages/tweepy/cursor.py in __next__(self)
47
48 def __next__(self):
---> 49 return self.next()
50
51 def next(self):
/Users/.../anaconda/lib/python3.5/site-packages/tweepy/cursor.py in next(self)
195 if self.current_page is None or self.page_index == len(self.current_page) - 1:
196 # Reached end of current page, get the next page...
--> 197 self.current_page = self.page_iterator.next()
198 self.page_index = -1
199 self.page_index += 1
/Users/.../anaconda/lib/python3.5/site-packages/tweepy/cursor.py in next(self)
106
107 if self.index >= len(self.results) - 1:
--> 108 data = self.method(max_id=self.max_id, parser=RawParser(), *self.args, **self.kargs)
109
110 if hasattr(self.method, '__self__'):
/Users/.../anaconda/lib/python3.5/site-packages/tweepy/binder.py in _call(*args, **kwargs)
243 return method
244 else:
--> 245 return method.execute()
246
247 # Set pagination mode
/Users/.../anaconda/lib/python3.5/site-packages/tweepy/binder.py in execute(self)
227 raise RateLimitError(error_msg, resp)
228 else:
--> 229 raise TweepError(error_msg, resp, api_code=api_error_code)
230
231 # Parse the response payload
TweepError: Twitter error response: status code = 401
答案 0 :(得分:1)
你只需要Twitter JSON吗?由于您的收集区域的范围,您可能想尝试twarc:https://github.com/edsu/twarc
答案 1 :(得分:0)
尝试在创建 API 时添加速率限制。
<p>Click the button to return the number of characters in the string "Hello World!".</p>
<input id="id" value="Hello World" />
<button onclick="myFunction()">Try it</button>
<p id="demo"></p>
如果这不能完全解决问题,请在 python 中使用(尝试和异常)来捕获错误并等待 15 分钟后再返回。