为什么reddit cloudsearch会通过时间戳搜索返回错误的结果?

时间:2016-08-25 16:28:58

标签: python reddit praw

我对此搜索有疑问:

list(r.search('timestamp:{}..{}'.format(ts1,ts2), sort='new', subreddit=subreddit, syntax='cloudsearch',limit=None))

它从时间戳ts1(在我的情况下为subreddit创建时间)到ts2

获得~1000个最新提交内容

我的脚本的作用是:

  1. 获取最新提交的内容
  2. 获取第二次最新提交的创建时间并将其设置为ts2
  3. 使用新时间戳进行搜索
  4. 如果在第一次搜索后我收到了提交1,2,3,4,5,6,7,8,9,那么在第二次搜索之后我希望得到3,4,5,6,7,8,9,但遗憾的是我没有得到它们,但得到类似7,8,9的内容。知道为什么吗?

    以下是我的脚本和示例结果。

    结果:

    t3_4zh8zw, 1472107937.0
    t3_4zgl1n, 1472096403.0
    t3_4zgf34, 1472093883.0
    t3_4zg8de, 1472091260.0
    t3_4zfzun, 1472087983.0
    t3_4zfysv, 1472087571.0
    t3_4zf8hg, 1472077921.0
    t3_4zf7g6, 1472077542.0
    t3_4zf4p5, 1472076595.0
    t3_4zf0d7, 1472075090.0
    t3_4zeqeg, 1472071708.0
    t3_4zeomz, 1472071134.0
    t3_4zebse, 1472066994.0
    t3_4zduso, 1472061376.0
    t3_4zdtne, 1472061014.0
    #######################
    t3_4zebse, 1472066994.0
    t3_4zduso, 1472061376.0
    t3_4zdtne, 1472061014.0
    t3_4zdipi, 1472057168.0
    t3_4zdfj3, 1472056078.0
    t3_4zd4v3, 1472052437.0
    t3_4zd0l5, 1472051081.0
    t3_4zctiu, 1472048701.0
    t3_4zazqj, 1472016633.0
    t3_4zawm3, 1472015079.0
    t3_4zavyc, 1472014757.0
    t3_4za5hb, 1472003960.0
    t3_4z9ydt, 1472001398.0
    t3_4z9xhx, 1472001065.0
    t3_4z9ufa, 1471999935.0
    

    脚本:

    import praw
    import time
    
    user_agent = 'clodsearch-timestamp test'
    r = praw.Reddit(user_agent=user_agent)
    
    subreddit = r.get_subreddit('laptops')
    
    ts1 = int(subreddit.created_utc)-1
    ts2 = int(time.time())
    
    submissions = list(r.search('timestamp:{}..{}'.format(ts1,ts2), sort='new', subreddit=subreddit, syntax='cloudsearch',limit=None) )
    
    for submission in submissions[:15]:
        print("{}, {}".format(submission.fullname, submission.created_utc))
    
    ts2 = int(submissions[1].created_utc) - 1
    
    print('#######################')
    
    submissions = list(r.search('timestamp:{}..{}'.format(ts1,ts2), sort='new', subreddit=subreddit, syntax='cloudsearch',limit=None) )
    
    for submission in submissions[:15]:
        print("{}, {}".format(submission.fullname, submission.created_utc))
    

1 个答案:

答案 0 :(得分:1)

就我可以收集的云搜索而言,您不应该使用created_utc

如果您将submission.created_utc更改为submission.created,您将获得所需的行为。

这是由于cloudsearch直接使用epochtime。无需将其转换为UTC或GMT,这样做会产生不同的效果,具体取决于您的时区。