Question

我在这里跟随this link来获取提及某个查询的所有推文。现在，代码到目前为止运行良好，我只是想确保自己真正理解了所有内容，因为即使我甚至不知道它是如何工作的，也不想使用某些代码。这是我的相关代码：

def searchMentions (tweetCount, maxTweets, searchQuery, tweetsPerQry, max_id, sinceId) :

while tweetCount < maxTweets:

    if (not max_id):

        if (not sinceId):

            new_tweets = api.search(q=searchQuery, count=tweetsPerQry)

        else:
            new_tweets = api.search(q=searchQuery, count = tweetsPerQry, since_id = sinceId)

    else: 

        if (not sinceId):

            new_tweets = api.search(q=searchQuery, count= tweetsPerQry, max_id=str(max_id -1))

        else:
            new_tweets = api.search(q=searchQuery, count=tweetsPerQry, max_id=str(max_id -1), since_id=sinceId)

    if not new_tweets:
        print("No new tweets to show")
        break

    for tweet in new_tweets :

        try :
            tweetCount += len(new_tweets)
            max_id = new_tweets[-1].id

            tweetId = tweet.user.id
            username = tweet.user.screen_name
            api.update_status(tweet.text)
            print(tweet.text)

        except tweepy.TweepError as e:
            print(e.reason)

        except StopIteration:
            pass

我假设

max_id和sinceId都设置为None，因为尚未找到任何tweet。 tweetCount设置为零。据我了解，while循环在tweetCount < maxTweets时运行。例如，我不确定为什么会这样，为什么我不能使用while True。起初我以为可能与api调用的速率有关，但这并没有任何意义。

然后，该函数检查max_id和sinceId。我假设它检查是否已经有一个max_id，如果max_id不存在，它会检查sinceId。如果sinceId为none，那么它会简单地将count参数设置为多少条推文，否则它将下限设置为sinceId，并且将计数参数设置为从sinceId设置为多少条推文。如果max_id不为none，但如果sinceId设置为none，则将上限设置为max_id，并获得一定数量的tweet，直到并包括该限制。因此，如果您有ID为1,2,3,4,5的tweet，且count = 3和max_id = 5的tweet则为3,4,5。否则，它将下限设置为sinceId，将上限设置为max_id并获取“之间”的推文。找到的推文保存在new_tweets中。

现在，该函数遍历new_tweets中的所有tweet，并将tweetCount设置为此列表的长度。然后将max_id设置为new_tweets[-1].id。由于twitter指定max_id为包含值，因此我假定将其设置为最后一条推文之前的下一条推文，因此不会重复发送推文，但是，我对此不太确定，也不知道我的函数将如何知道最后一条推文之前的ID是。将发布一条重复new_tweets中所说的任何推文的推文。因此，总而言之，我的问题是：

我可以用while True代替while tweetCount < maxTweets吗？否则，为什么？
我对函数的解释是否正确？如果不正确，我在哪里出错了？
max_id = new_tweets[-1].id到底能做什么？
为什么在for循环中不将sinceId设置为新值？由于开始以来将sinceId设置为None，所以如果我们不在任何地方更改该值，则似乎无需遍历sinceId设置为None的选项。

免责声明：我确实阅读过Twitter explantion关于max_id，since_id，计数等的说明，但未回答我的问题。

Answer 1

我可以在True而不是tweetCount

自从我使用Twitter API以来已经有一段时间了，但是如果我没记错的话，一个小时内您的通话和推文数量有限。这是为了保持Twitter相对干净。我记得maxTweets应该是您要获取的数量。这就是为什么您可能不想使用while True的原因，但是我相信您可以毫无问题地替换它。最终您会遇到一个异常，那就是API告诉您达到最大数量。

max_id = new_tweets [-1] .id有什么作用？

每条推文都有一个ID，即您在打开URL时看到的ID。您可以使用它来引用代码中的特定推文。该代码的作用是将返回列表中最后一条Tweet的ID更新为您最后一条Tweet的ID。（基本上更新变量）。请记住，调用负索引是指从列表末尾开始的元素。

我不确定您还有其他两个问题，如果发现任何问题，我将在稍后进行编辑。

Answer 2

几个月前，我对Search API使用了相同的参考。我开始了解一些可能对您有所帮助的事情。我假设API以有序的方式（tweet_id的降序）返回tweets。

假设我们有一堆推文，推特给我们一个查询，这些推文ID从1到10（1是最旧的，而10是最新的）。

1 2 3 4 5 6 7 8 9 10

since_id =下限和 max_id =上限

Twitter开始按最新到最旧的顺序（从10到1）返回这些推文。让我们举一些例子：

# This would return tweets having id between 4 and 10 ( 4 and 10 inclusive )    
since_id=4,max_id=10

# This means there is no lower bound, and we will receive as many 
# tweets as the Twitter Search API permits for the free version ( i.e. for the last 7 
# days ). Hence, we will get tweets with id 1 to 10 ( 1 and 10 inclusive )
since_id=None, max_id=10

max_id = new_tweets [-1] .id有什么作用？

假设在第一个API调用中我们仅收到4条tweet，即10、9、8、7。因此，new_tweets列表变为（出于解释的目的，我假设它是id的列表，实际上对象列表）：

new_tweets=[10,9,8,7] 
max_id= new_tweets[-1]   # max_id = 7

现在，当我们的程序第二次点击该API时：

max_id = 7
since_id = None

new_tweets = api.search（q = searchQuery，count = tweetsPerQry，max_id = str（max_id -1），since_id = sinceId）

# We will receive all tweets from 6 to 1 now.
max_id = 6  # max_id=str(max_id -1)
#Therefore
new_tweets = [6,5,4,3,2,1]

这种使用API的方式（如参考资料中所述），对于我们进行的每个API调用，最多可以返回100条推文。返回的tweets的实际数量少于100，并且还取决于查询的复杂程度，复杂程度越低越好。

为什么在for循环中不将sinceId设置为新值？由于开始时将sinceId设置为None，因此，如果我们不在任何地方更改该值，则似乎无需遍历sinceId不设置为None的选项。

设置sinceId = None将返回最早的推文，但是如果我们不提及它，我不确定sinceId的默认值是什么。

我可以在True而不是tweetCount

您可以执行此操作，但是随后您需要处理达到速率限制时会遇到的异常（即，每个呼叫100条推文）。使用此功能可以更轻松地处理程序。

希望这对您有所帮助。

tweepy：使用max_id和since_id

2 个答案: