使用python或bash对具有相同id的2个文件进行Concate

时间:2017-06-26 11:41:37

标签: python bash

我有两个文件:

首先调用文件1:date,name,age

第二个名为文件2:date,name,age

以下是一个例子:

file1.csv:

2015/1/2,Jina,17
2015/1/3,JJ,25
2015/1/4,Carole,8

file2.csv:

2015/1/1,Rouba,14
2015/1/2,GG,78
2015/1/3,James,7
2015/1/4,Elie,15

我需要以相同的日期加入这两个文件 对于此示例,输出应为:

filex.txt:

2015/1/1,Rouba,14
2015/1/2,GG,78,Jina,17
2015/1/3,James,7,JJ,25
2015/1/4,Elie,15,Carole,8 

任何帮助?

4 个答案:

答案 0 :(得分:1)

<强> file1.csv

2015/1/2,Jina,17
2015/1/3,JJ,25
2015/1/4,Carole,8

<强> file2.csv

2015/1/1,Rouba,14
2015/1/2,GG,78
2015/1/3,James,7
2015/1/4,Elie,15

您的解决方案:

import pandas as pd

df1 = pd.read_csv('file1.csv', names=["Name", "Age"], index_col=0,
                  header=-1)
df2 = pd.read_csv('file2.csv', names=["Name", "Age"], index_col=0,
                  header=-1)

df = pd.concat([df2, df1], axis=1)

df.to_csv('filex.csv', header=False)

<强> filex.csv

2015/1/1,,,Rouba,14
2015/1/2,GG,78,Jina,17
2015/1/3,James,7,JJ,25
2015/1/4,Elie,15,Carole,8

如果您想删除filex.csv中的多个逗号:

import re

with open('filex.csv', 'r') as desc:
    filex = re.sub(',+', ',', desc.read())

with open('filex.txt', 'w') as desc:
    desc.write(filex)

<强> filex.txt

2015/1/1,Rouba,14,
2015/1/2,GG,78,Jina,17
2015/1/3,James,7,JJ,25
2015/1/4,Elie,15,Carole,8

答案 1 :(得分:0)

当你在 Linux 操作系统上时,这是一个使用 awk 工具的简短单行程序:

awk -F, 'NR==FNR{ a[$1]=$2 FS $3; next }{ if($1 in a) $0=$0 OFS a[$1] }1' file1 OFS=',' file2

输出:

2015/1/1,Rouba,14
2015/1/2,GG,78,Jina,17
2015/1/3,James,7,JJ,25
2015/1/4,Elie,15,Carole,8

答案 2 :(得分:0)

尝试:

awk -F, 'NR==FNR{a[$1]=$2 FS $3;next}{printf("%s%s\n",$0,a[$1]?","a[$1]:"");}' file1  file2 > filex

编辑:现在添加一种非单一形式的解决方案,但有解释。

awk -F, 'FNR==NR{                                     ###-F is to set field separator, FNR==NR condition will be TRUE when first Input_file will be read. file in this case.
                a[$1]=$2 FS $3;                       ###creating an aray named a whose index is $1 and value is $2 FS $3, where FS is field seprator(space by default)
                next                                  ###next is awks built-in keyword which will skip all the next statements.
                }
                {
                printf("%s%s\n",$0,a[$1]?","a[$1]:"") ###printing the value of $0(current line of file2) and checking if array a value with index 41 is present
                }                                     ###if that is present then print , and array a value with index $1 else print null.
        ' file1 file2 > filex                         ###mentioning file1 and mentioning file2 also here.

答案 3 :(得分:0)

不使用熊猫,但需要更长的解决方案:

import csv

file1_list = []
with open('file1', 'r') as file1:
    reader = csv.reader(file1)
    file1_list = [item for item in reader]


file2_list = []
with open('file2', 'r') as file2:
    reader = csv.reader(file2)
    file2_list = [item for item in reader]


for item in file1_list:
    print(item[0])


result = []

for item_1 in file1_list:
    for item_2 in file2_list:
        if item_1[0] == item_2[0]:
            item_1.extend(item_2[1:])
            result.append(item_1)

for item_1 in file2_list:
    flag = True
    for item_2 in result:
        if item_1[0] == item_2[0]:
            flag = False
    if flag:
        result.append(item_1)

for item_1 in file1_list:
    flag = True
    for item_2 in result:
        if item_1[0] == item_2[0]:
            flag = False
    if flag:
        result.append(item_1)

print(result)