合并具有相同日期的两个文件

时间:2017-07-12 06:54:03

标签: python bash

我有两个文件,file1file2我需要根据日期合并为filex。这是一个例子:

文件1:

20150122,735620,iamSelected,CIG,20150122,735620,4.40902,-0.4255319148934609,-3.0,iamSelected,GRH,20150122,735620,0.62,-3.0
20150123,735621,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6
20150124,735622,iamSelected,B,20150124,735622,7,-3
20150125,735623,iamSelected,K,20150125,735622,10,6.5

file2的:

20150122,735620,iamSelected,CIGG,20150122,735620,4.40902,-0.4255319148934609,-3.0
20150123,735621,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6
20150125,735623,iamSelected,B,20150125,735623,7,-3
20150126,735624,iamSelected,KK,20150126,735624,10,6.5

输出filex应如下所示:

FILEX:

20150122,735620,iamSelected,CIG,20150122,735620,4.40902,-0.4255319148934609,-3.0,iamSelected,GRH,20150122,735620,0.62,-3.0,iamSelected,CIGG,20150122,735620,4.40902,-0.4255319148934609,-3.0
20150123,735621,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6
20150124,735622,iamSelected,B,20150124,735622,7,-3
20150125,735623,iamSelected,K,20150125,735622,10,6.5,iamSelected,B,20150125,735623,7,-3
20150126,735624,iamSelected,KK,20150126,735624,10,6.5

我试过了:

os.system("awk -F, 'NR==FNR{ a[$1]=$2 FS $3; next }{ if($1 in a) $0=$0 OFS a[$1] }1' file1 OFS=',' file2 >output")

但它不起作用!! 有什么帮助吗?

3 个答案:

答案 0 :(得分:1)

awk代码不起作用。 a[$1]=$2 FS $3仅存储第一个文件的第二个和第三个字段,并使用$1作为密钥。下面的解决方案使用复合键$1 OFS $2(如果不正确,则从哈希引用中删除OFS $2),将它们从$0中删除,并将其余字符串哈希作为数据。

试试这个:

$ awk 'BEGIN{FS=OFS=","} NR==FNR{k=$1 OFS $2;sub(/^([^,]+,){2}/,"");a[k]=$0;next}{print $0 (a[$1 OFS $2]==""?"":OFS) a[$1 OFS $2];delete a[$1 OFS $2]}END{for(i in a)print i,a[i]}' file2 file1
20150122,735620,iamSelected,CIG,20150122,735620,4.40902,-0.4255319148934609,-3.0,iamSelected,GRH,20150122,735620,0.62,-3.0,iamSelected,CIGG,20150122,735620,4.40902,-0.4255319148934609,-3.0
20150123,735621,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6
20150124,735622,iamSelected,B,20150124,735622,7,-3
20150125,735623,iamSelected,K,20150125,735622,10,6.5,iamSelected,B,20150125,735623,7,-3
20150126,735624,iamSelected,KK,20150126,735624,10,6.5

说明:

$ awk '
BEGIN { FS=OFS="," }                                # delimiters
NR==FNR {                                           # file2
    k=$1 OFS $2                                     # construct key for hashing
    sub(/^([^,]+,){2}/,"")                          # remove 2 first fields
    a[k]=$0                                         # hash
    next
}
{                                                   # file1
    print $0 (a[$1 OFS $2]==""?"":OFS) a[$1 OFS $2] # merge and print
    delete a[$1 OFS $2]                             # delete hash entry
}
END {                                               # process non-referred hash entries
    for(i in a)
        print i,a[i]
}' file2 file1

答案 1 :(得分:0)

使用join命令的简短解决方案:

join -j1 -t, -a1 -a2  file1 file2 > filex

filex内容:

20150122,735620,iamSelected,CIG,20150122,735620,4.40902,-0.4255319148934609,-3.0,iamSelected,GRH,20150122,735620,0.62,-3.0,735620,iamSelected,CIGG,20150122,735620,4.40902,-0.4255319148934609,-3.0
20150123,735621,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6,735621,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6
20150124,735622,iamSelected,B,20150124,735622,7,-3
20150125,735623,iamSelected,K,20150125,735622,10,6.5,735623,iamSelected,B,20150125,735623,7,-3
20150126,735624,iamSelected,KK,20150126,735624,10,6.5

答案 2 :(得分:0)

Python代码。

def file_contents(file_name):
with open(file_name, 'r') as fn:
    return fn.readlines()

f1_cont = sorted(file_contents('file1'))
f2_cont = sorted(file_contents('file2'))

out_put = open('filex', 'w')
for f in f1_cont:
        try:
                for j in xrange(len(f2_cont)):                        if f2_cont[j].startswith(f.split(",")[0]):
                                out_put.write(((f.strip('\n')+','+str(",".join(f2_cont[j].strip('\n').split(",")[2:])))+"\n")  )
                                f2_cont.remove(f2_cont[j])
                                continue
                out_put.write(f+"\n")
        except IndexError:
                pass    
for i in f2_cont:       
        out_put.write(i+"\n")
out_put.close()

产生你想要的输出[在你的问题中]

20150122,735620,iamSelected,CIG,20150122,735620,4.40902,-0.4255319148934609,-3.0,iamSelected,GRH,20150122,735620,0.62,-3.0,iamSelected,CIGG,20150122,735620,4.40902,-0.4255319148934609,-3.0
20150123,735621,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6
20150124,735622,iamSelected,B,20150124,735622,7,-3
20150125,735623,iamSelected,K,20150125,735622,10,6.5,iamSelected,B,20150125,735623,7,-3
20150126,735624,iamSelected,KK,20150126,735624,10,6.5