我尝试合并多个文件,这些文件的语法与下面显示的示例类似。目前,我一直在试验两个文件。这些文件将始终具有相同的行数,相同的日期,相同的时间,并按相同的顺序排序。唯一的区别应该是在值字段中。
File1.csv
date,time,value,status
2014/09/10,22:47:25,-0.0000000003542,9
2014/09/10,23:14:25,-0.0000000002892,9
2014/09/10,23:23:46,0.0000000005406,9
2014/09/10,23:41:48,-0.0000000000142,9
2014/09/11,00:18:40,-0.0000000009977,9
File2.csv
date,time,value,status
2014/09/10,22:47:25,0.0000000725578,9
2014/09/10,23:14:25,-0.0000000283722,9
2014/09/10,23:23:46,-0.0000000368988,9
2014/09/10,23:41:48,-0.0000000675033,9
2014/09/11,00:18:40,-0.0000000774759,9
所需的输出
date,time,value,value
2014/09/10,22:47:25,-0.0000000003542,0.0000000725578
2014/09/10,23:14:25,-0.0000000002892,-0.0000000283722
2014/09/10,23:23:46,0.0000000005406,-0.0000000368988
2014/09/10,23:41:48,-0.0000000000142,-0.0000000675033
2014/09/11,00:18:40,-0.0000000009977,-0.0000000774759
我对保持合并结果中的状态值不感兴趣。我已尝试使用最新的join命令的多个变体:
join -t, -a 1 -a 2 -o 1.1 1.2 1.3 2.3 File1.csv File2.csv
不幸的是,我一直在输出类似于下面的输出,它根本没有显示来自File1.csv的数据。
当前输出
date,time,value,value
,,,0.0000000725578
,,,-0.0000000283722
,,,-0.0000000368988
,,,-0.0000000675033
,,,-0.0000000774759
,,,0.0000001042118
有人有任何建议吗?
感谢。
更新
作为对此的跟进,我已经回过头来更新输入文件,将日期和时间合并到一个字段中,如下所示。
File1.csv
DATE_TIME,值,状态
2014/09/10 22:47:25,-0.0000000003542,9
2014/09/10 23:14:25,-0.0000000002892,9
2014/09/10 23:23:46,0.0000000005406,9
2014/09/10 23:41:48,-0.0000000000142,9
2014/09/11 00:18:40,-0.0000000009977,9
File2.csv
DATE_TIME,值,状态
2014/09/10 22:47:25,0.0000000725578,9
2014/09/10 23:14:25,-0.0000000283722,9
2014/09/10 23:23:46,-0.0000000368988,9
2014/09/10 23:41:48,-0.0000000675033,9
2014/09/11 00:18:40,-0.0000000774759,9
因此,我已将join命令更新为如下所示:
加入-t,-a 1 -a 2 -o" 1.1 1.2 2.2" File1.csv File2.csv
不幸的是,我仍然得到一个似乎省略了File1.csv的内容的输出。
当前输出
DATE_TIME,值,值
,, 0.0000000725578
,, - 0.0000000283722
,, - 0.0000000368988
,, - 0.0000000675033
,, - 0.0000000774759
更新
似乎问题与每个文件中的标头相关联。如果我从文件中删除标头,然后尝试以下连接字符串:
加入-t,-a 1 -a 2 -o" 1.1 1.2 2.2" File1.csv File2.csv
它提供了以下所需的输出:
2014/09/10 22:47:25,-0.0000000003542,0.0000000725578
2014/09/10 23:14:25,-0.0000000002892,-0.0000000283722
2014/09/10 23:23:46,0.0000000005406,-0.0000000368988
2014/09/10 23:41:48,-0.0000000000142,-0.0000000675033
2014/09/11 00:18:40,-0.0000000009977,-0.0000000774759
是否有人知道如何使连接忽略输入文件的标题?
谢谢,
答案 0 :(得分:0)
没有测试的单行内容:
awk -F, -v OFS="," '{k=$1 FS $2}NR==FNR{a[k]=$3;next}
k in a{print k,a[k],$3}' file1 file2
答案 1 :(得分:0)
您需要将所有输出字段规范放在一个参数中,因此您必须引用它:
join -t, -a 1 -a 2 -o "1.1 1.2 1.3 2.3" File1.csv File2.csv
然而,这不会产生你想要的输出。 join
加入一个关键字段,默认为第一个字段。由于您在多行中具有相同的日期,因此这些日期将连接在一起,结果为:
date,time,value,value
2014/09/10,22:47:25,-0.0000000003542,0.0000000725578
2014/09/10,22:47:25,-0.0000000003542,-0.0000000283722
2014/09/10,22:47:25,-0.0000000003542,-0.0000000368988
2014/09/10,22:47:25,-0.0000000003542,-0.0000000675033
2014/09/10,23:14:25,-0.0000000002892,0.0000000725578
2014/09/10,23:14:25,-0.0000000002892,-0.0000000283722
2014/09/10,23:14:25,-0.0000000002892,-0.0000000368988
2014/09/10,23:14:25,-0.0000000002892,-0.0000000675033
2014/09/10,23:23:46,0.0000000005406,0.0000000725578
2014/09/10,23:23:46,0.0000000005406,-0.0000000283722
2014/09/10,23:23:46,0.0000000005406,-0.0000000368988
2014/09/10,23:23:46,0.0000000005406,-0.0000000675033
2014/09/10,23:41:48,-0.0000000000142,0.0000000725578
2014/09/10,23:41:48,-0.0000000000142,-0.0000000283722
2014/09/10,23:41:48,-0.0000000000142,-0.0000000368988
2014/09/10,23:41:48,-0.0000000000142,-0.0000000675033
2014/09/11,00:18:40,-0.0000000009977,-0.0000000774759
相反,您可以加入time
字段:
join -1 2 -2 2 -t, -a 1 -a 2 -o "1.1 1.2 1.3 2.3" File1.csv File2.csv
这是有效的,因为它需要对行进行排序。因此,如果有重复的时间,则会出现故障,并且无法与前一天的线路匹配。