我正在尝试计算两个格式不同的单独目录中的行数并比较结果。所需的文本文件都在第一个目录中的单个目录中,但第二个目录包含我需要迭代的子目录,保存名称,然后拉取并计算以名称开头的所有txt文件。
目前,我无法将子目录名称与以它开头的txt文件进行比较。我的追溯如下:
be29X1(149)% ./SeriesCount.py
Traceback (most recent call last):
File "./SeriesCount.py", line 23, in <module>
for fn in files('subdir_name*.txt'):
TypeError: 'list' object is not callable
我不需要永久保存子目录名称,因为我关心的是将txt文件名及其计数存储到dict中。例如,如果目录名称是“regprices”,我想拉出目录中以“regprices”开头的所有文本文件的行数。代码如下:
#!/usr/bin/env python
import csv
import copy
import os
import sys
import glob
import dircmp
#set dicts
dict1 = {}
dict2 = {}
final_dict = {}
#parses through directory 1, counts lines, saves to a dict
for fn in glob.glob('/data/*.txt'):
with open(fn) as f:
dict1[fn] = [1 for line in f if line.strip() and not line.startswith('#')]
#parses through subdirectories in directory 2, counts lines, saves to a dict
for subdir, dirs, files in os.walk('/docs/prod/count/'):
subdir_name = os.getcwd()
for fn in files('subdir_name*.txt'):
dict2[fn] = [1 for line in f if line.strip() and not line.startswith('#')]
#compare dicts, overwrite counts from dict1 with dict2, save to final dict
save final dictionary with key/val pairs to a csv
with open('seriescount.csv', 'w') as f:
w = csv.DictWriter(f, final_dict)
w.writeheader()
w.writerow({k:sum(v) for k, v in final_dict.items()})
奖励积分如果您可以帮助语法比较两个词典,将计数从dir2覆盖到dir1,并将它们保存到final_dict
答案 0 :(得分:0)
示例遍布不同命名变量的地方。绝对不是一个有效的例子。很难找出你想要实现的目标。
不确定您是如何尝试根据文件名比较字典键的。这是试图猜测你想要实现的目标。
import glob
import os
def count_lines(filename):
with open(filename,'r') as f:
count = sum(1 for line in f if line.strip() and line[0] != '#')
return count
def directory1_count(path='/data/*.txt'):
counts = {}
for filepath in glob.glob(path):
directory, filename = os.path.split(filepath)
name, extension = os.path.splitext(filename)
counts[name] = count_lines(filepath)
return counts
def directory2_count(path='/docs/prod/count/'):
counts = {}
for directory, dirs, files in os.walk(path):
_, subdir = os.path.split(directory)
for filename in [x for x in files if x.startswith(subdir) and x.endswith('.txt')]:
name, extension = os.path.splitext(filename)
filepath = os.path.join(directory,filename)
counts[name] = count_lines(filepath)
return counts
counts = directory1_count()
counts.update(directory2_count())