我正在尝试一个程序,我有一个目录并有文本文件列表,如果我找到" color ="然后它会找到'文件名的模糊值。和'文件的起始行',所以:
我需要: 找到模糊值的最大值,需要找到具有最大值
的文件的前5行我做了一个编码,它可以找到所有模糊值,但我不知道如何找到最大值,然后打印前5个文件具有最大模糊值。请帮助!
import os
from fuzzywuzzy import fuzz
path = r'C:\Python27'
data = {}
for dir_entry in os.listdir(path):
dir_entry_path = os.path.join(path, dir_entry)
if os.path.isfile(dir_entry_path):
with open(dir_entry_path, 'r') as my_file:
for line in my_file:
for part in line.split():
if "color=" in part:
print part
string1= "Filename:", dir_entry_path
print(string1)
string2= "Start line of file:", list(my_file)[0]
print(string1)
string3=(fuzz.ratio(string1, string2))
print(string3)
我的输出现在看起来像:
"color="
('Filename:', 'C:\\Python27\\maybeee.py')
('Filename:', 'C:\\Python27\\maybeee.py')
20
"color="
('Filename:', 'C:\\Python27\\mayp.py')
('Filename:', 'C:\\Python27\\mayp.py')
28
part.startswith('color='):
('Filename:', 'C:\\Python27\\mayp1.py')
('Filename:', 'C:\\Python27\\mayp1.py')
29
我需要输出,考虑到这里的最大值为29,所以我需要打印具有最大值的文件的前5行。请帮助!答案将不胜感激。
答案 0 :(得分:1)
您的代码尝试再次重新读取整个文件(在list(myfile)[0]
),同时已经有一个迭代器。这很麻烦。
最好将5个第一行(这是你要问的,是吗?)存储在变量中,然后在条件匹配时打印出来。
此外,您正在打印string1
两次。
将循环更改为:
from collections import defaultdict
filenames2fuzz = defaultdict(list)
for dir_entry in os.listdir(path):
dir_entry_path = os.path.join(path, dir_entry)
if os.path.isfile(dir_entry_path):
first5lines = []
condition_matched_in_file = False
with open(dir_entry_path, 'r') as my_file:
for line_nbr, line in enumerate(my_file):
if line_nbr < 5:
first5lines.append(line)
for part in line.split():
if "color=" in part:
print part
string1= "Filename:", dir_entry_path
print(string1)
condition_matched_in_file = True
fuzziness = fuzz.ratio(string1, first5lines[0])
filenames2fuzz[dir_entry_path].append(fuzziness)
print(fuzziness)
if condition_matched_in_file:
print('\n'.join(first5lines))
# Now that you have a dictionary that holds all filenames with
# their fuzziness values, you can easily find the first 5 lines again
# of the file that has the best fuzziness value.
best_fuzziness_ratio = 0 # as far as I can tell, the docs indicate it is between 0 and 100
for k, v in filenames2fuzz.items():
if max(v) > best_fuzziness_ratio:
best_fuzzy_file = k
best_fuzziness_ratio = max(v)
print('File {} has the highest fuzzy value '
'of {}. \nThe first 5 lines are:\n'
''.format(best_fuzzy_file, best_fuzziness_ratio))
with open(best_fuzzy_file) as f:
for i in range(5):
print(f.readline())
您可以进行一些更优化(查看os.walk)并且没有更好的问题解释(提供有关您正在循环的文件的详细信息,列出其内容的部分内容),这是我能做的最好的事情。