从多个随机文件中查找第一个最大的文件

时间:2014-03-08 03:36:09

标签: python python-2.7

def line_count(filename):

for filename in os.walk(os.path.abspath('my directory filename')):
    lines = 0
    with open(filename) as file:
        lines = len([line for line in file.readlines() if line.strip() != ''])
    print lines  

def find_big_files(files):

file_sizes = [(line_count(file), file) for file in files] 
print sorted(file_sizes, key = lambda file_size: file_size[0], reverse = True)

sorted_files = find_big_files(file)

不起作用。

2 个答案:

答案 0 :(得分:0)

由于您正在寻找LONGEST文件,而不是BIGGEST文件,请执行以下操作:

def get_length(file):
    len_ = 0
    with open(file,'r') as f:
        for line in f: len_+=1
    return len_

files = [file for file in however_you_build_your_list]
files = sorted(files, key=get_length)
# files[0] is now the longest
# files[-1] is now the shortest

答案 1 :(得分:0)

您是否将空行计为行?

如果是这样,以下内容为您提供文件中原始换行符的数量:

  def line_count(filename):
      lines = 0
      with open(filename) as file:
           lines = len(file.readlines())
      return lines

如果没有,请将lines = ...更改为:

  lines = len([line for line in file.readlines() if line.strip() != ''])

因此,其余代码如下所示:

  def find_big_files(files):
      largest        = (0, None)
      second_largest = (0, None)
      for file in files:
          size = line_count(file)
          if size > largest[0]:
             second_largest = largest
             largest        = (size, file)
      return largest, second_largest

请注意,这实际上是低效的,因为它必须打开每个文件并遍历它。所以它是O(文件*计数(文件))。但是,如果你真的关心行数,那不是真正的好办法,至少对于通用.txt文件或类似文件。

如果您想要从大多数行到最少行的整个列表:

  def find_big_files(files):
      file_sizes = [(line_count(file), file) for file in files] 
      return sorted(file_sizes, key = lambda file_size: file_size[0])

将返回(line_count,file_name)元组的列表,列表[-1]将是最大的,列表[-2]将是第二大,依此类推。

修改

OP要求我将整个代码发布在一个解决问题的块中,所以这里是:

  def line_count(filename):
      lines = 0
      with open(filename) as file:
           lines = len([line for line in file.readlines() if line.strip() != ''])
      return lines

  def find_big_files(files):
      file_sizes = [(line_count(file), file) for file in files] 
      return sorted(file_sizes, key = lambda file_size: file_size[0], reverse = True)

result = file_big_files(files)返回的[(count, filename), ...]将从最大到最小,因此result[0]将是最大的,result[1]将是第二大,等等。按原始顺序,它们位于文件路径的输入列表中。