Python,在多个文本文件中查找字符串的最快方法(某些文件很大)

时间:2017-11-30 16:18:16

标签: python-3.4

我尝试在多个文件中搜索字符串,我的代码工作正常但对于大文本文件则需要几分钟。

wrd = b'my_word'
path = 'C:\path\to\files'
    #### opens the path where all of .txt files are ####
for f in os.listdir(path):
    if f.strip().endswith('.txt'):
        with open(os.path.join(path, f), 'rb') as ofile:
        #### loops through every line in the file comparing the strings ####
            for line in ofile:                
               if wrd in line:
                try:
                    sendMail(...)
                    logging.warning('There is an error {} in this file : {}'.format(line, f))
                    sys.exit(0)
                except IOError as e:
                    logging.error('Operation failed: {}' .format(e.strerror))
                    sys.exit(0)

我找到了这个主题:Python finds a string in multiple files recursively and returns the file path 但它没有回答我的问题..

你知道如何加快速度吗?

在Windows Server 2003上使用python3.4。

Thx;)

1 个答案:

答案 0 :(得分:1)

我的文件是从oracle应用程序生成的,如果有错误,我会记录它并停止生成我的文件。

所以我通过从末尾读取文件来搜索我的字符串,因为正在寻找的字符串是Oracle错误并且位于文件的末尾。

wrd = b'ORA-'
path = 'C:\path\to\files'    
     #### opens the path where all of .txt files are ####
    for f in os.listdir(path):
        if f.strip().endswith('.txt'):
            with open(os.path.join(path, f), 'r') as ofile:
                        try:
                            ofile.seek (0, 2)           # Seek a end of file
                            fsize = ofile.tell()        # Get Size
                            ofile.seek (max (fsize-1024, 0), 0) # Set pos a last n chars
                            lines = ofile.readlines()       # Read to end

                            lines = lines[-10:]    # Get last 10 lines
                            for line in lines:
                                if string in line:
                                    sendMail(.....)
                                    logging.error('There is an error {} in this file : {}'.format(line, f))
                                    sys.exit(0)
                        except IOError as e:
                            logging.error('Operation failed: {}'.format(e.strerror))
                            sys.exit(0)