将两个文本文件合并到python上的字典中

时间:2017-11-01 18:06:30

标签: python dictionary text merge

我有两个文本文件,我想将它合并到一个字典中,以便它包含第一个文本文件的所有行,第一个文本文件和第二个文本文件(如果可用)的值。有人可以帮忙吗?

例如,

a.txt看起来像这样:

apple 5 7e-6 na 2.2
banana 9 3e-2 2 2.1
orange na 9.2 2.1 na

b.txt看起来像这样:

orange 5 6.2 na 6e-3 nd
mango 4 7.3 na 7 3
apple 4 4.4 4.3 na 2

我想在python上合并这两个文本文件,以便得到如下输出:

apple 5 7e-6 na 2.2 4 4.4 4.3 na 2
banana 9 3e-2 2 2.1 na na na na na
orange na 9.2 2.1 na 5 6.2 na 6e-3 nd

我尝试为两个文本文件制作两个词典并将它们一起添加,使用以下代码:

with open('a.txt', 'r') as document:
    a = {}
    for line in document:
        if line.strip():  # non-empty line?
            key, value = line.split(None, 1) 
            a[key] = value.split()

with open('b.txt', 'r') as document:
    b = {}
    for line in document:
        if line.strip():  # non-empty line?
            key, value = line.split(None, 1) 
            b[key] = value.split()

def combineDict(*args):
    result = {}
    for dic in args:
        for key in (result.viewkeys() | dic.keys()):
            if key in dic:
                if type(dic[key]) is list:
                    result.setdefault(key, []).extend(dic[key])
return result

final = combineDict(a,b)

但这也保留了第二个文本文件中的所有值(如'mango')

1 个答案:

答案 0 :(得分:0)

一次读入一个文件,并将每行拆分为键和值,然后将其值添加到数据库中的该键。

#!/usr/bin/env python3

import io

file_a = io.StringIO("""apple 5 7e-6 na 2.2
banana 9 3e-2 2 2.1
orange na 9.2 2.1 na""")

file_b = io.StringIO("""orange 5 6.2 na 6e-3 nd
mango 4 7.3 na 7 3
apple 4 4.4 4.3 na 2""")

def dict_from_file(f):
    def line_splitter(line):
        items = line.strip().split()
        return items[0], items[1:]
    # To use filenames, change 'f' above to filename and
    # use this line:
    # with open(filename) as f:
    return {k:v
            for line in f
            for k, v in (line_splitter(line),)}

# Read in initial file
d = dict_from_file(file_a)
# For each additional file append values for already
# existing keys. Extend all values to be the same
# length, filling with 'na'.
files = file_b,
for f in files:
    max_length = 0
    for k, v in dict_from_file(f).items():
        if k in d:
            d[k].extend(v)
            max_length = len(d[k])
    for v in d.values():
        v += ['na'] * (max_length - len(v))

for k, v in d.items():
    print(('{:12} ' + '{:6}' * len(v)).format(k, *v))

输出:

apple        5     7e-6  na    2.2   4     4.4   4.3   na    2     
banana       9     3e-2  2     2.1   na    na    na    na    na    
orange       na    9.2   2.1   na    5     6.2   na    6e-3  nd