Python - 合并两个表

时间:2015-10-07 08:01:41

标签: python python-3.x file-io

我想将表中的数据与python(python 3.4)合并。 我的示例数据如下所示,我希望得到那种结果表。

[表1]

Name1 Name2
AAAA XXXX
BBBB YYYY
CCCC ZZZZ

[表2]

Index1 Sample1 Sample2 Sample3
AAAA 10 20 30
BBBB 25 25 25
CCCC 30 31 32
XXXX 27 29 31
YYYY 45 21 56
ZZZZ 48 24 10

[结果表]

Index2 Sample1 Sample2 Sample3
AAAA+XXXX 37 49 61
BBBB+YYYY 70 46 81
CCCC+ZZZZ 78 55 42

虽然这似乎是一个简单的问题,但我找不到好的解决方案因为我是python中的新手而且我对python库并不熟悉。如果我在DB上使用SQL可能很容易,但我想在没有DB的情况下解决它。 有没有人有好主意?

2 个答案:

答案 0 :(得分:2)

以下csv方法适用于您的示例数据:

import csv

with open('table2.txt', 'r') as f_table2:
    csv_table2 = csv.reader(f_table2, delimiter=' ', skipinitialspace=True)
    table2_header = next(csv_table2)
    table2_data = {cols[0] : cols[1:] for cols in csv_table2}

with open('table1.txt', 'r') as f_table1, open('output.csv', 'w', newline='\n') as f_output:
    csv_table1 = csv.reader(f_table1, delimiter=' ', skipinitialspace=True)
    table1_header = next(csv_table1)
    csv_output = csv.writer(f_output)
    csv_output.writerow(table2_header)

    csv_output.writerows(
        ['{}+{}'.format(cols[0], cols[1])] + [int(x) + int(y) for x, y in zip(table2_data[cols[0]], table2_data[cols[1]])] for cols in csv_table1)

这将为您提供输出CSV文件,如下所示:

Index1,Sample1,Sample2,Sample3
AAAA+XXXX,37,49,61
BBBB+YYYY,70,46,81
CCCC+ZZZZ,78,55,42

使用Python 3.4.3进行测试

答案 1 :(得分:1)

如果您正在使用纯python(没有第三方库,例如numpy),则可以这样做:

class Entry:
    def __init__(self, index, sample1, sample2, sample3):
        self.index = index
        self.sample1 = sample1
        self.sample2 = sample2
        self.sample3 = sample3

    def __add__(self, other):
        return '{index2} {sample1} {sample2} {sample3}'.format(
            index2=self.index + '+' + other.index,
            sample1=self.sample1 + other.sample1,
            sample2=self.sample2 + other.sample2,
            sample3=self.sample3 + other.sample3,
        )


def read_table(path_to_data):
    def extract_body(content):
        return [e.strip().split(' ') for e in content[1:]]

    with open(path_to_data, 'r') as f:
        content = f.readlines()
    return extract_body(content)


content1 = read_table('data1.txt')
content2 = read_table('data2.txt')

entries = [Entry(e[0], int(e[1]), int(e[2]), int(e[3])) for e in content2]

# output
print('Index2 Sample1 Sample2 Sample3')

for line in content1:
    entry1 = next(e for e in entries if e.index == line[0])
    entry2 = next(e for e in entries if e.index == line[1])

    print(entry1 + entry2)