Question

我整天都在努力思考，试图在堆栈溢出上做很多事情而没有任何效果，所以如果这真的很简单，我很抱歉，我很想念它。

我的情况是我的Python机器人正在从帖子中获取帖子ID，并将其放入文本文件中。

subreddit = reddit.subreddit('pythonforengineers')
# limiting the selection to the top 5 in hot
for submission in subreddit.new(limit=20):
    # re.findall is performing the filtering = removing all text but the found keys
    a = re.findall(steamKey15, submission.selftext, re.IGNORECASE)
    b = re.findall(steamKey25, submission.selftext, re.IGNORECASE)
    c = re.findall(steamKey17, submission.selftext, re.IGNORECASE)
    readPostIDFile()
    while submission.id not in steamKeyPostID:
        if a:
            #print(a)
            savePostID()
            saveSteamKey()
            removeDups()
        if b:
            #print(b)
            savePostID()
            saveSteamKey()
            removeDups()
        if c:
            #print(c)
            savePostID()
            saveSteamKey()
            removeDups()
        break

这是我的循环逻辑。这3个变量是steamKey15 / 25/17，因此我将在所有3种格式的帖子上对其进行测试，并且很自然地它会返回所有3种格式，但它还将帖子ID 3x写入我的文本文件。

以下是帖子ID保存的逻辑：

def savePostID():
    #print(submission.selftext)                
    #print(submission.id)
    # adds the id to the text file
    steamKeyPostID.append(submission.id)
    with open('steamKeyPostID.txt', 'a') as f:
        for post_id in steamKeyPostID:
            f.write(submission.id + '\n')
            if submission.id not in 'steamKeyPostID.txt':      
                print('Beep. Boop. Bot saving the keys of: ' + '"' + submission.title + '"'+ ' to ---> steamKeys.txt')
                break           
            else:
                print('No keys were found!')
                break

我的目标是阻止多个帖子ID到达文本文件，或者对文本文件执行过滤以删除重复项（这将需要写回同一文本文件）。我不确定哪个会更容易，但是我一直都在尝试并失败。

我尝试使用OrderedDict和不同类型的set（）代码。我还尝试过修改for / if循环并更改其过滤的内容。我觉得这很容易，但是我却遇到了无尽的错误。我使用的是Python 3.7。

感谢您的帮助！我可能要等到明天，我需要休息一下。

Answer 1

由于您已经将所有submission.id附加到steamKeyPostID中的savePostID()上，因此只需添加if submission.id not in steamKeyPostID:的支票即可避免重复：

def savePostID():
    if submission.id not in steamKeyPostID:
        steamKeyPostID.append(submission.id)
        with open('steamKeyPostID.txt', 'a') as f:
            for post_id in steamKeyPostID:
                f.write(submission.id + '\n')
                ...

Answer 2

对于那些因谷歌搜索而无意中发现的人。通过防止添加重复项而不是删除重复项，我找到了解决方案。

我使用了以下代码块：

# this is the logic block for ensuring duplicate posts are not read. 
with open('steamKeyPostID.txt', 'r') as f:
    #read the existing .txt file
    steamKeyPostID = f.read()
    #put it on a newline
    steamKeyPostID = steamKeyPostID.split('\n')
    #gets rid of empty elements in the .txt and saves it to a list(array kinda)
    steamKeyPostID = list(filter(None, steamKeyPostID))

应该做的是将文本文件的所有内容保存到列表中。然后，我使用while语句来过滤列表：

while submission.id not in steamKeyPostID:

submission.id正在使用reddit API模块Praw。

当我重新运行该程序时，这对我来说非常理想，它会跳过文本文件中的所有帖子ID。

如何删除/阻止文本文件中的重复行

2 个答案: