Question

我有这种格式的文件：

a11      0.0
a12    132.0
b13      0.0
b42    584.0
randomstuff
etc
a11      0.0
a12      6.0
b13    138.0
b42      6.0

有成千上万的a ##，b ##，c ##等组合，但是它们之间反复地重复着一些无用的东西。我想为每个项目添加所有数字，所以我只有：

a11, 0
a12, 138
b13, 138
b42, 590

我需要某种方式来生成每个标识符（a11，a12等），因为有成千上万个不同的标识符。

Answer 1

要生成所有组合，一个简单的方法就是3次循环：

for letter in 'abcdefghijkmnopqrstuvwxyz':
    for digit1 in '0123456789':
        for digit2 in '0123456789':
            print(letter + digit1 + digit2)

哪个生成a00-> z99

但是要解析此数据，检查输入行是否遵循格式，然后将其汇总为字典可能更容易

code_sums = {}  # empty dictionary
lines = open("input_file.txt", "rt").readlines()
for row in lines:
    # check the line is good input
    # cleanup and single space
    row = row.strip().replace('\t', ' ')
    while (row.find('  ') != -1):
        row = row.replace('  ', ' ')  # double space to single
    # verify there's only two values in the line
    if (len(row.split(' ')) == 2):
        code, value = row.split(' ')
        if (len(code) == 3 and
            code[0] in 'abcdefghijklmnopqrstuvwxyz' and 
            code[1].isdigit() and 
            code[2].isdigit()):
            try:
                float_val = float(value)
                # looks like we have valid input, tally the value
                if (code in code_sums):
                    code_sums[code] += float_val
                else:
                    code_sums[code] = float_val
            except:
                pass # probably a malformed input line

#for code in code_sums.keys():
#    print("%s -> %7.1f" % (code, code_sums[code]))

fout = open("output_file.csv", "wt")  # TODO - handle errors
fout.write("Code,Sum\n")
for code in code_sums.keys():
    fout.write("%s,%7.1f\n" % (code, code_sums[code]))
fout.close()

Python：根据杂乱无序的文件中的先前标识符添加数字

1 个答案: