Python:根据杂乱无序的文件中的先前标识符添加数字

时间:2018-09-19 00:13:27

标签: python-3.x

我有这种格式的文件:

a11      0.0
a12    132.0
b13      0.0
b42    584.0
randomstuff
etc
a11      0.0
a12      6.0
b13    138.0
b42      6.0

有成千上万的a ##,b ##,c ##等组合,但是它们之间反复地重复着一些无用的东西。我想为每个项目添加所有数字,所以我只有:

a11, 0
a12, 138
b13, 138
b42, 590

我需要某种方式来生成每个标识符(a11,a12等),因为有成千上万个不同的标识符。

1 个答案:

答案 0 :(得分:1)

要生成所有组合,一个简单的方法就是3次循环:

for letter in 'abcdefghijkmnopqrstuvwxyz':
    for digit1 in '0123456789':
        for digit2 in '0123456789':
            print(letter + digit1 + digit2)

哪个生成a00-> z99

但是要解析此数据,检查输入行是否遵循格式,然后将其汇总为字典可能更容易

code_sums = {}  # empty dictionary
lines = open("input_file.txt", "rt").readlines()
for row in lines:
    # check the line is good input
    # cleanup and single space
    row = row.strip().replace('\t', ' ')
    while (row.find('  ') != -1):
        row = row.replace('  ', ' ')  # double space to single
    # verify there's only two values in the line
    if (len(row.split(' ')) == 2):
        code, value = row.split(' ')
        if (len(code) == 3 and
            code[0] in 'abcdefghijklmnopqrstuvwxyz' and 
            code[1].isdigit() and 
            code[2].isdigit()):
            try:
                float_val = float(value)
                # looks like we have valid input, tally the value
                if (code in code_sums):
                    code_sums[code] += float_val
                else:
                    code_sums[code] = float_val
            except:
                pass # probably a malformed input line

#for code in code_sums.keys():
#    print("%s -> %7.1f" % (code, code_sums[code]))

fout = open("output_file.csv", "wt")  # TODO - handle errors
fout.write("Code,Sum\n")
for code in code_sums.keys():
    fout.write("%s,%7.1f\n" % (code, code_sums[code]))
fout.close()