如何获得1500条推文?我尝试了页面参数并发现,这不起作用,我现在卡在了max_id和since_id。我不知道max_id和since_id。如果进行查询,我希望获得自查询发送以来最近的1500条推文。这是我的代码:
# -*- coding: utf-8 -*-
import urllib
import simplejson
def searchTweets(query):
search = urllib.urlopen("http://search.twitter.com/search.json?q="+query)
dict = simplejson.loads(search.read())
counter = 0
for result in dict["results"]:
print "*",result["text"].encode('utf-8')
counter += 1
print "\n",counter," tweets found","\n"
searchTerm = "steak"
searchTweets(searchTerm+"&rpp=100&page=15")
有人知道解决方案吗?
答案 0 :(得分:1)
让我为1200条推文工作:
# -*- coding: utf-8 -*-
import urllib
import simplejson
def searchTweets(query, minimum_tweets):
results = []
i=0
while len(results)<minimum_tweets:
if i==0: # First time through don't include max id
response = urllib.urlopen("http://search.twitter.com/search.json?q="+query+"&rpp=100")
else: # Subsequent times include max id
response = urllib.urlopen("http://search.twitter.com/search.json?q="+query+"&rpp=100&max_id="+max_id)
response = simplejson.loads(response.read())
if not response['results']: break # Break if no tweets are returned
max_id = str(long(response['results'][-1]['id_str'])-1) # Define max_id for next iteration
results.extend(response['results']) # Extend tweets to results array
i += 1
print "\n",len(results)," tweets found","\n"
searchTerm = "steak"
searchTweets(searchTerm, 1200)
问题在于搜索twitter API经常中断,并且没有错误处理或重试。但它应该向您展示max_id背后的逻辑。我将max_id设置为小于上拉的最后一条推文的id,因此没有重复。
此外,还有更优雅的方法来决定是否在网址中包含max_id。这个解决方案是因为max_id doesn't have a default value(我希望:p)