我有两个文件,file1
和file2
我需要根据日期合并为filex
。这是一个例子:
文件1:
20150122,735620,iamSelected,CIG,20150122,735620,4.40902,-0.4255319148934609,-3.0,iamSelected,GRH,20150122,735620,0.62,-3.0
20150123,735621,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6
20150124,735622,iamSelected,B,20150124,735622,7,-3
20150125,735623,iamSelected,K,20150125,735622,10,6.5
file2的:
20150122,735620,iamSelected,CIGG,20150122,735620,4.40902,-0.4255319148934609,-3.0
20150123,735621,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6
20150125,735623,iamSelected,B,20150125,735623,7,-3
20150126,735624,iamSelected,KK,20150126,735624,10,6.5
输出filex
应如下所示:
FILEX:
20150122,735620,iamSelected,CIG,20150122,735620,4.40902,-0.4255319148934609,-3.0,iamSelected,GRH,20150122,735620,0.62,-3.0,iamSelected,CIGG,20150122,735620,4.40902,-0.4255319148934609,-3.0
20150123,735621,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6
20150124,735622,iamSelected,B,20150124,735622,7,-3
20150125,735623,iamSelected,K,20150125,735622,10,6.5,iamSelected,B,20150125,735623,7,-3
20150126,735624,iamSelected,KK,20150126,735624,10,6.5
我试过了:
os.system("awk -F, 'NR==FNR{ a[$1]=$2 FS $3; next }{ if($1 in a) $0=$0 OFS a[$1] }1' file1 OFS=',' file2 >output")
但它不起作用!! 有什么帮助吗?
答案 0 :(得分:1)
awk代码不起作用。 a[$1]=$2 FS $3
仅存储第一个文件的第二个和第三个字段,并使用$1
作为密钥。下面的解决方案使用复合键$1 OFS $2
(如果不正确,则从哈希引用中删除OFS $2
),将它们从$0
中删除,并将其余字符串哈希作为数据。
试试这个:
$ awk 'BEGIN{FS=OFS=","} NR==FNR{k=$1 OFS $2;sub(/^([^,]+,){2}/,"");a[k]=$0;next}{print $0 (a[$1 OFS $2]==""?"":OFS) a[$1 OFS $2];delete a[$1 OFS $2]}END{for(i in a)print i,a[i]}' file2 file1
20150122,735620,iamSelected,CIG,20150122,735620,4.40902,-0.4255319148934609,-3.0,iamSelected,GRH,20150122,735620,0.62,-3.0,iamSelected,CIGG,20150122,735620,4.40902,-0.4255319148934609,-3.0
20150123,735621,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6
20150124,735622,iamSelected,B,20150124,735622,7,-3
20150125,735623,iamSelected,K,20150125,735622,10,6.5,iamSelected,B,20150125,735623,7,-3
20150126,735624,iamSelected,KK,20150126,735624,10,6.5
说明:
$ awk '
BEGIN { FS=OFS="," } # delimiters
NR==FNR { # file2
k=$1 OFS $2 # construct key for hashing
sub(/^([^,]+,){2}/,"") # remove 2 first fields
a[k]=$0 # hash
next
}
{ # file1
print $0 (a[$1 OFS $2]==""?"":OFS) a[$1 OFS $2] # merge and print
delete a[$1 OFS $2] # delete hash entry
}
END { # process non-referred hash entries
for(i in a)
print i,a[i]
}' file2 file1
答案 1 :(得分:0)
使用join
命令的简短解决方案:
join -j1 -t, -a1 -a2 file1 file2 > filex
filex
内容:
20150122,735620,iamSelected,CIG,20150122,735620,4.40902,-0.4255319148934609,-3.0,iamSelected,GRH,20150122,735620,0.62,-3.0,735620,iamSelected,CIGG,20150122,735620,4.40902,-0.4255319148934609,-3.0
20150123,735621,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6,735621,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6
20150124,735622,iamSelected,B,20150124,735622,7,-3
20150125,735623,iamSelected,K,20150125,735622,10,6.5,735623,iamSelected,B,20150125,735623,7,-3
20150126,735624,iamSelected,KK,20150126,735624,10,6.5
答案 2 :(得分:0)
Python代码。
def file_contents(file_name):
with open(file_name, 'r') as fn:
return fn.readlines()
f1_cont = sorted(file_contents('file1'))
f2_cont = sorted(file_contents('file2'))
out_put = open('filex', 'w')
for f in f1_cont:
try:
for j in xrange(len(f2_cont)): if f2_cont[j].startswith(f.split(",")[0]):
out_put.write(((f.strip('\n')+','+str(",".join(f2_cont[j].strip('\n').split(",")[2:])))+"\n") )
f2_cont.remove(f2_cont[j])
continue
out_put.write(f+"\n")
except IndexError:
pass
for i in f2_cont:
out_put.write(i+"\n")
out_put.close()
产生你想要的输出[在你的问题中]
20150122,735620,iamSelected,CIG,20150122,735620,4.40902,-0.4255319148934609,-3.0,iamSelected,GRH,20150122,735620,0.62,-3.0,iamSelected,CIGG,20150122,735620,4.40902,-0.4255319148934609,-3.0
20150123,735621,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6,iamSelected,A,20150123,735621,5,6,iamSelected,AA,20150123,735621,7,6
20150124,735622,iamSelected,B,20150124,735622,7,-3
20150125,735623,iamSelected,K,20150125,735622,10,6.5,iamSelected,B,20150125,735623,7,-3
20150126,735624,iamSelected,KK,20150126,735624,10,6.5