auth = tweepy.OAuthHandler(consumer_token, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)
user_objs = []
name = "phungsuk wangdu"
id_strs = {}
page_no = 0
try:
for page in tweepy.Cursor(api.search_users, name).pages(3):
dup_count = 0
print("******* Page", str(page_no))
print("Length of page", len(page))
user_objs.extend(page)
for user_obj in page:
id_str = user_obj._json['id_str']
if id_str in id_strs:
# print("Duplicate for:", id_str, "from page number:", id_strs[id_str])
dup_count += 1
else:
# print(id_str)
id_strs[id_str] = page_no
time.sleep(1)
print("Duplicates in page", str(page_no), str(dup_count))
page_no += 1
except Exception as ex:
print(ex)
使用上面的代码,我试图使用tweepy(Python 3.5.2,tweepy 3.5.0)游标获取用户的搜索结果。结果与传递的pages参数重复。是使用tweepy游标查询search_users的正确方法吗?我使用以下模式获得上述代码的结果:
1. for low search results(name = "phungsuk wangdu") (There are actually 9 results returned for manual search on twitter website):
******* Page 0
Length of page 2
Duplicates in page 0 0
******* Page 1
Length of page 2
Duplicates in page 1 2
******* Page 2
Length of page 2
Duplicates in page 2 2
******* Page 3
Length of page 2
Duplicates in page 3 2
2. for high search results (name = "jon snow")
******* Page 0
Length of page 20
Duplicates in page 0 0
******* Page 1
Length of page 20
Duplicates in page 1 20
******* Page 2
Length of page 20
Duplicates in page 2 0
******* Page 3
Length of page 20
Duplicates in page 3 0
答案 0 :(得分:1)
尝试将此属性添加到Cursor;应该减少重复。
q= <your query> +" -filter:retweets"
答案 1 :(得分:0)
这里有两个问题。
我使用这两个修复程序对pull request做了推文。