这让我发疯了。正如您在下面所见,我尝试使用简单的while循环执行几个tweepy搜索并将它们附加到数据框中。出于某种原因,在拉出第一组100条推文之后,它只是重复该组而不是执行新的搜索。任何建议都将不胜感激。
import sys
import csv
import pandas as pd
import tweepy
from tweepy import OAuthHandler
consumer_key = ''
consumer_secret = ''
access_token = ''
access_secret = ''
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)
num_results = 200
result_count = 0
last_id = None
df = pd.DataFrame(columns=['Name', 'Location', 'Followers', 'Text', 'Coorinates'])
while result_count < num_results:
result = api.search(q='',count=100, geocode= "38.996918,-104.995826,190mi", since_id = last_id)
for tweet in result:
user = tweet.user
last_id = tweet.id_str
name = user.name
friends = user.friends_count
followers = user.followers_count
text = tweet.text.encode('utf-8')
location = user.location
coordinates = tweet.coordinates
df.loc[result_count] = pd.Series({'Name':name, 'Location':location, 'Followers':followers, 'Text':text, 'Coordinates':coordinates})
print(text)
result_count += 1
# Save to Excel
print("Writing all tables to Excel...")
df.to_csv('out.csv')
print("Excel Export Complete.")
答案 0 :(得分:0)
API.search方法返回与指定查询匹配的推文。它不是Streaming APi,因此它会立即返回所有数据。
此外,在您的查询参数中,您添加了count
,它指定了要检索的状态数。
所以问题在于,对于您的查询,您将在每次迭代时返回完整集的前100个数据。
我建议您更改类似的代码
result = api.search(q='', geocode= "38.996918,-104.995826,190mi", since_id = last_id)
for tweet in result:
user = tweet.user
last_id = tweet.id_str
name = user.name
friends = user.friends_count
followers = user.followers_count
text = tweet.text.encode('utf-8')
location = user.location
coordinates = tweet.coordinates
df.loc[result_count] = pd.Series({'Name':name, 'Location':location, 'Followers':followers, 'Text':text, 'Coordinates':coordinates})
print(text)
让我知道。