Tweepy游标.pages()与api.search_users一次又一次地返回相同的页面

时间:2017-10-26 18:54:43

标签: python-3.x tweepy

    auth = tweepy.OAuthHandler(consumer_token, consumer_secret)
    auth.set_access_token(access_token, access_secret)
    api = tweepy.API(auth)
    user_objs = []
    name = "phungsuk wangdu"
    id_strs = {}
    page_no = 0
    try:
        for page in tweepy.Cursor(api.search_users, name).pages(3):
            dup_count = 0
            print("*******  Page", str(page_no))
            print("Length of page", len(page))
            user_objs.extend(page)
            for user_obj in page:
                id_str = user_obj._json['id_str']
                if id_str in id_strs:
                    # print("Duplicate for:", id_str, "from page number:", id_strs[id_str])
                    dup_count += 1
                else:
                    # print(id_str)
                    id_strs[id_str] = page_no
            time.sleep(1)
            print("Duplicates in page", str(page_no), str(dup_count))
            page_no += 1
    except Exception as ex:
        print(ex)

使用上面的代码,我试图使用tweepy(Python 3.5.2,tweepy 3.5.0)游标获取用户的搜索结果。结果与传递的pages参数重复。是使用tweepy游标查询search_users的正确方法吗?我使用以下模式获得上述代码的结果:

1. for low search results(name = "phungsuk wangdu") (There are actually 9 results returned for manual search on twitter website):

    *******  Page 0
    Length of page 2
    Duplicates in page 0 0
    *******  Page 1
    Length of page 2
    Duplicates in page 1 2
    *******  Page 2
    Length of page 2
    Duplicates in page 2 2
    *******  Page 3
    Length of page 2
    Duplicates in page 3 2

2. for high search results (name = "jon snow")

    *******  Page 0
    Length of page 20
    Duplicates in page 0 0
    *******  Page 1
    Length of page 20
    Duplicates in page 1 20
    *******  Page 2
    Length of page 20
    Duplicates in page 2 0
    *******  Page 3
    Length of page 20
    Duplicates in page 3 0

2 个答案:

答案 0 :(得分:1)

尝试将此属性添加到Cursor;应该减少重复。

q= <your query> +" -filter:retweets"

答案 1 :(得分:0)

这里有两个问题。

  1. 当python的页码从1开始时,Tweepy的游标的pageiterator从0开始页面编号。
  2. Python返回上一个可用页面的结果,页面编号大于可用结果。
  3. 我使用这两个修复程序对pull request做了推文。