我有一个像'苹果'的字符串。我想找到这个字符串,我知道它存在于数百个文件中的一个中。 e.g。
file1
file2
file3
file4
file5
file6
...
file200
所有这些文件都在同一目录中。使用python查找哪个文件包含此字符串的最佳方法是什么,知道只有一个文件包含它。
我想出了这个:
for file in os.listdir(directory):
f = open(file)
for line in f:
if 'apple' in f:
print "FOUND"
f.close()
和此:
grep = subprocess.Popen(['grep','-m1','apple',directory+'/file*'],stdout=subprocess.PIPE)
found = grep.communicate()[0]
print found
答案 0 :(得分:8)
鉴于文件都在同一目录中,我们只获得当前目录列表。
import os
for fname in os.listdir('.'): # change directory as needed
if os.path.isfile(fname): # make sure it's a file, not a directory entry
with open(fname) as f: # open file
for line in f: # process line by line
if 'apples' in line: # search for string
print 'found string in file %s' %fname
break
这将自动获取当前目录列表,并检查以确保任何给定条目是文件(而不是目录)。
然后打开文件并逐行读取(以避免内存问题它不会立即读取它)并在每行中查找目标字符串。
当找到目标字符串时,它会打印文件名。
此外,由于文件是使用with
打开的,因此当我们完成(或发生异常)时,它们也会自动关闭。
答案 1 :(得分:2)
for x in os.listdir(path):
with open(x) as f:
if 'Apple' in f.read():
#your work
break
答案 2 :(得分:1)
为简单起见,假设您的文件位于当前目录中:
def whichFile(query):
for root,dirs,files in os.walk('.'):
for file in files:
with open(file) as f:
if query in f.read():
return file
答案 3 :(得分:0)
懒惰评估,基于itertools的方法
import os
from itertools import repeat, izip, chain
gen = (file for file in os.listdir("."))
gen = (file for file in gen if os.path.isfile(file) and os.access(file, os.R_OK))
gen = (izip(repeat(file), open(file)) for file in gen)
gen = chain.from_iterable(gen)
gen = (file for file, line in gen if "apple" in line)
gen = set(gen)
for file in gen:
print file