Question

我正在尝试计算两个格式不同的单独目录中的行数并比较结果。所需的文本文件都在第一个目录中的单个目录中，但第二个目录包含我需要迭代的子目录，保存名称，然后拉取并计算以名称开头的所有txt文件。

目前，我无法将子目录名称与以它开头的txt文件进行比较。我的追溯如下：

be29X1(149)% ./SeriesCount.py
Traceback (most recent call last):
  File "./SeriesCount.py", line 23, in <module>
    for fn in files('subdir_name*.txt'):
TypeError: 'list' object is not callable

我不需要永久保存子目录名称，因为我关心的是将txt文件名及其计数存储到dict中。例如，如果目录名称是“regprices”，我想拉出目录中以“regprices”开头的所有文本文件的行数。代码如下：

#!/usr/bin/env python

import csv
import copy
import os
import sys
import glob
import dircmp

#set dicts 
dict1 = {}
dict2 = {}
final_dict = {}

#parses through directory 1, counts lines, saves to a dict
for fn in glob.glob('/data/*.txt'):
    with open(fn) as f:
        dict1[fn] = [1 for line in f if line.strip() and not line.startswith('#')]

#parses through subdirectories in directory 2, counts lines, saves to a dict
for subdir, dirs, files in os.walk('/docs/prod/count/'):
    subdir_name = os.getcwd()
    for fn in files('subdir_name*.txt'):
        dict2[fn] = [1 for line in f if line.strip() and not line.startswith('#')]

#compare dicts, overwrite counts from dict1 with dict2, save to final dict

save final dictionary with key/val pairs to a csv
with open('seriescount.csv', 'w') as f:
    w = csv.DictWriter(f, final_dict) 
    w.writeheader()
    w.writerow({k:sum(v) for k, v in final_dict.items()})

奖励积分如果您可以帮助语法比较两个词典，将计数从dir2覆盖到dir1，并将它们保存到final_dict

Answer 1

示例遍布不同命名变量的地方。绝对不是一个有效的例子。很难找出你想要实现的目标。

不确定您是如何尝试根据文件名比较字典键的。这是试图猜测你想要实现的目标。

import glob
import os


def count_lines(filename):
  with open(filename,'r') as f:
    count = sum(1 for line in f if line.strip() and line[0] != '#')
  return count

def directory1_count(path='/data/*.txt'):
  counts = {}
  for filepath in glob.glob(path):
    directory, filename = os.path.split(filepath)
    name, extension = os.path.splitext(filename)
    counts[name] = count_lines(filepath)
  return counts

def directory2_count(path='/docs/prod/count/'):
  counts = {}
  for directory, dirs, files in os.walk(path):
    _, subdir = os.path.split(directory)
    for filename in [x for x in files if x.startswith(subdir) and x.endswith('.txt')]:
      name, extension = os.path.splitext(filename)
      filepath = os.path.join(directory,filename)
      counts[name] = count_lines(filepath)
  return counts

counts = directory1_count()
counts.update(directory2_count())

Python：迭代子目录行计数

1 个答案: