Question

需要在数千个文件中搜索特定的字符串/元数据，十六进制标记等，但是这个python代码我只搜索了一个需要很长时间的文件

def check():
        datafile = file('example.txt')
        found = False
        for line in datafile:
            if blabla in line:
                found = True
                break

        return found

found = check()
if found:
    print "true"
else:
    print "false"

有什么建议吗？感谢

Answer 1

使文件名/路径成为函数的参数。然后你的函数可以处理任何文件，而不仅仅是一个特定的文件。然后，为您希望它处理的每个文件调用该函数。您可能希望列出要处理的文件名/路径，然后有一个循环，为每个文件执行您想要的操作。

例如

def check(fname):
    datafile = open(fname)
    found = False
    # ...
    return found

files = ['a', 'b', 'c']
for fname in files:
    found = check(fname)
    if found:
        print("true")
    else:
        print("false")

Answer 2

假设文件全部包含在目录“/ foo”中：

import os, re
#Use a re.findall() to avoid line-by-line parsing
myrex = re.compile('blabla')

def check(filename):
    with open(filename) as myfile:
        matches = myrex.findall(myfile.read())
        return len(matches) > 0

os.chdir("/foo")
#Use an os.walk() to find the names of all files in this directory
for root,dir,files in os.walk('.'):
    for fname in files:
        print fname + ": " + str(check(fname))

如果文件存储在多个位置，则需要在“os.chdir（）”块周围添加一个额外的循环。如果您要搜索多个模式，请使用另一个“re.compile（）”。

这有助于回答您的问题吗？

Answer 3

您可能希望考虑glob或os.walk来检索文件名，但类似于：

import fileinput

print any(blabla in line for line in fileinput.input(['some', 'list', 'of', 'file', 'names'])

这会自动按顺序读取文件，并会在真实测试中短路。

Answer 4

如果所有文件都在一个目录中，您可以使用os.listdir()获取这些文件。这将为您提供目录中所有文件的列表。从那里，您可以访问每个，例如os.listdir('/home/me/myData')。如果您使用的是基于unix的系统：grep是一个非常强大的工具，可以为您提供更大的灵活性。您可能需要grep -r "your query" ./ > results.txt。这将为您提供与您的搜索匹配的每一行，并包括使用正则表达式的选项...并将其保存到文件中。否则，仅使用python搜索大量文件：

def check(x):
    return "blabla" in x
files = os.listdir('/home/me/files')
for f in files:
    x = open(f, "r").read()
    print check(x)

我的检查功能表现不同，因为它不会逐行检查，而True和False会以大写字母打印。

我想你可能想知道结果来自哪个文件。（和什么线？）

for f in files:
    x = open(f, "r").read().split('\n')
    for count in range( len(x) ):
        if check(x[count]):
            print f + " " + count + " " +x[count]

......或者你需要知道的任何事情。

在多个文件中搜索字符串和元数据

4 个答案: