Python从文件中读取,只有在找不到字符串时才能工作

时间:2016-08-28 22:04:03

标签: python praw

所以我试图制作一个reddit机器人来执行提交的代码。我有自己的子控件来控制这些客户端。

while __name__ == '__main__':
    string = open('config.txt').read()
    for submission in subreddit.get_new(limit = 1):
        if submission.url not in string:
            f.write(submission.url + "\n")
            f.close()
            f = open('config.txt', "a")
            string = open('config.txt').read()

所以我们要做的是从配置文件中读取,然后只有在提交URL不在config.txt中时才能工作。但是,它始终会看到最新的帖子,并且能够完成它的工作。这就是F的打开方式。

if not os.path.exists('file'):
    open('config.txt', 'w').close()
f = open('config.txt', "a")

1 个答案:

答案 0 :(得分:0)

首先批评你现有的代码(在评论中):

# the next two lines are not needed; open('config.txt', "a") 
# will create the file if it doesn't exist.
if not os.path.exists('file'):
    open('config.txt', 'w').close()
f = open('config.txt', "a")

# this is an unusual condition which will confuse readers
while __name__ == '__main__':
    # the next line will open a file handle and never explicitly close it
    # (it will probably get closed automatically when it goes out of scope,
    # but it's not good form)
    string = open('config.txt').read()
    for submission in subreddit.get_new(limit = 1):
        # the next line should check for a full-line match; as written, it 
        # will match "http://www.test.com" if "http://www.test.com/level2"
        # is in config.txt
        if submission.url not in string:
            f.write(submission.url + "\n")
            # the next two lines could be replaced with f.flush()
            f.close()
            f = open('config.txt', "a")
            # this is a cumbersome way to keep your string synced with the file,
            # and it never explicitly releases the new file handle
            string = open('config.txt').read()
    # If subreddit.get_new() doesn't return any results, this will act as
    # a busy loop, repeatedly requesting new results as fast as possible.
    # If that is undesirable, you might want to sleep here.
# file handle f should get closed after the loop

上面提到的所有问题都不会使您的代码无法正常工作(除非可能是不精确的匹配)。但更简单的代码可能更容易调试。这里有一些代码可以做同样的事情。注意:我假设任何其他进程都不可能同时写入config.txt。您可以逐行使用pdb尝试此代码(或代码),以查看它是否按预期工作。

import time
import praw
r = praw.Reddit(...)
subreddit = r.get_subreddit(...)

if __name__ == '__main__':
    # open config.txt for reading and writing without truncating. 
    # moves pointer to end of file; closes file at end of block
    with open('config.txt', "a+") as f:
        # move pointer to start of file
        f.seek(0) 
        # make a list of existing lines; also move pointer to end of file
        lines = set(f.read().splitlines())

        while True:
            got_one = False
            for submission in subreddit.get_new(limit=1):
                got_one = True
                if submission.url not in lines:
                    lines.add(submission.url)
                    f.write(submission.url + "\n")
                    # write data to disk immediately
                    f.flush()
                    ...
            if not got_one:
                # wait a little while before trying again
                time.sleep(10)