Python:从多个文本文件中搜索字符串

时间:2018-11-13 11:31:36

标签: python python-3.x

我想做什么:

  1. 提取sample1.tgz文件。
  2. 存储到“ sample1”目录中
  3. 从sample1 / nvram2 / log / TextFiles中搜索字符串

完整路径=> C:\Users\username\scripts\sample1\nvram2\logs\version.txt

注意:文本文件具有不同的扩展名

示例:

textFile.txt 
textFile.txt.0 
textFile.txt.1 
textFile.log 
textFile

我尝试过的事情:

import os,tarfile, glob

string_to_search=input("Enter the string you want to search : ")

#all_files holds all the files in current directory
all_files = [f for f in os.listdir('.') if os.path.isfile(f)] 
    for current_file in all_files: 
        print("Reading " + current_file)

        if (current_file.endswith(".tgz")) or (current_file.endswith("tar.gz")):
            tar = tarfile.open(current_file, "r:gz")
            #file_name contains only name by removing the extension
            file_name=os.path.splitext(current_file)[0]
            os.makedirs(file_name) #make directory with the file name
            output_file_path=file_name  #Path to store the files after extraction
            tar.extractall(output_file_path) #extract the current file
            tar.close()
            #---Following code is to find  the string from all the files in a directory---
            path=output_file_path + '\nvram2\logs\*'
            files=glob.glob(path)

            for file1 in files: 
                with open(file1) as f2:
                    for line in f2:
                        if string_to_search in line:
                            #print file name which contains the string
                            print(file1)
                            #print the line which contains the string
                            print(str(line))

问题:

我认为,问题出在路径上。当我尝试使用以下代码执行代码时,它可以工作。

path='\nvram2\logs\*.txt'

但是它仅检查'.txt'文件扩展名。但是我想搜索所有文件扩展名。

当我尝试以下代码时,它不起作用。这里的output_file_path包含sample1,即目录名

path=output_file_path + '\nvram2\logs\*'

3 个答案:

答案 0 :(得分:1)

将文件解压缩到文件夹后,您可以使用 os.walk 访问给定路径中的所有文件并进行比较。

示例代码:

import os

# Extract tar file
# ...
# ...

path = output_file_path + r'\nvram\logs'

for dirpath, dirs, files in os.walk(path):
    # dirpath : current dir path
    # dirs : directories found in currect dir path
    # files : files found in currect dir path

    # iterate each files
    for file in files:

        # build actual path of the file by joining to dirpath
        file_path = os.path.join(dirpath, file)

        # open file
        with open(file_path) as file_desc:

            # iterate over each line, enumerate is used to get line count
            for ln_no, line in enumerate(file_desc):
                if string_to_search in line:
                    print('Filename: {}'.format(file))
                    print('Text: {}'.format(line.strip()))
                    print('Line No: {}\n'.format(ln_no + 1))

答案 1 :(得分:1)

以下是解决该问题的完整代码:

import os,tarfile, glob

string_to_search=input("Enter the string you want to search : ")

#all_files holds all the files in current directory
all_files = [f for f in os.listdir('.') if os.path.isfile(f)] 
for current_file in all_files: 
    if (current_file.endswith(".tgz")) or (current_file.endswith("tar.gz")):
        tar = tarfile.open(current_file, "r:gz")
        #file_name contains only name by removing the extension
        file_name=os.path.splitext(current_file)[0] 
        os.makedirs(file_name) #make directory with the file name
        output_file_path=file_name  #Path to store the files after extraction
        tar.extractall(output_file_path) #extract the current file
        tar.close()

        #----Following code is to find  the string from all the files in a directory
        path1=output_file_path + r'\nvram2\logs'
        all_files=glob.glob(os.path.join(path1,"*"))
        for my_file1 in glob.glob(os.path.join(path1,"*")):
            if os.path.isfile(my_file1): # to discard folders
                with open(my_file1, errors='ignore') as my_file2:
                    for line_no, line in enumerate(my_file2):
                        if string_to_search in line:
                            print(string_to_search + " is found in " + my_file1 + "; Line Number = " + str(line_no))

this answer获得帮助。找不到路径和文件的问题已通过“使用文件名加入目录来解决。”

答案 2 :(得分:0)

您可以添加一个条件来检查文件中是否存在“ .txt”

files= os.listdir(output_file_path + '/nvram2/logs/')

for file1 in files:   
   if '.txt' in file1:
       with open(file1) as f2:
           for line in f2:
               if string_to_search in line:
                    #print file name which contains the string
                    print(file1)
                    #print the line which contains the string
                    print(str(line))