将基于日期时间的2个CSV文件与Shell合并

时间:2019-01-29 05:01:18

标签: shell csv awk

嗨, 我有3个csv文件,如下所示

datetime, forecast 2016-02-02 00:00:00, 23.34 2016-02-02 00:10:00, 29.23

timestamp, forecast, v1, v2 2016-02-02 00:00:00, 68.56, 012, .23 2016-02-02 00:10:00, 23.24, .25, .32

timestamp, forecast[ma], v1 2016-02-02 00:00:00, 56.32, 32 2016-02-02 00:10:00, 25.21, 56

我希望我的输出具有

Time, Forecast, forecast1, forecast2 2016-02-02 00:00:00, 23.34, 68.56, 56.32 2016-02-02 00:10:00, 29.23, 23.24, 25.21

我创建了将xlsx中的这些文件与python组合在一起的代码。现在,我计划使用Shell进一步处理这些文件,因此我希望将此文件保存在csv中。

我尝试过类似的代码。

join -j 2 -o 1.1,1.2,2.2 <(sort -k2 $path_DMS/$file_name) <(sort -k2 $path_ISRO/$file_name)

谢谢

1 个答案:

答案 0 :(得分:1)

请尝试以下操作(这在大多数awk中都可以使用)。

awk '
BEGIN{
  FS=OFS=", "
  print "Time, Forecast, forecast1, forecast2"
}
FNR==1{
  ++count
  next
}
count==1{
  a[$1]=$2
  next
}
count==2{
  a[$1]=a[$1] OFS $2
  next
}
count==3{
  print $1,a[$1],$2
}'  file1.csv file2.csv file3.csv

输出如下。

Time, Forecast, forecast1, forecast2
2016-02-02 00:00:00, 23.34, 68.56, 56.32
2016-02-02 00:10:00, 29.23, 23.24, 25.21

说明: 现在为上述代码添加详细说明。

awk '                                                ##Starting awk program here.
BEGIN{                                               ##Mentioning BEGIN section of awk which will execute before Input_file(s) getting read.
  FS=OFS=", "                                        ##Setting FS and OFS as ", " read man awk for FS and OFS too.
  print "Time, Forecast, forecast1, forecast2"       ##Printing headers for output.
}                                                    ##Closing BEGIN section here.
FNR==1{                                              ##Checking condition if this is first line of all Input_file(s).
  ++count                                            ##Increment variable count with 1 here.
  next                                               ##next will skip all further statements from here.
}                                                    ##Closing FNR==1 BLOCK here.
count==1{                                            ##Checking if count==1 then do following.
  a[$1]=$2                                           ##Creating an array a whose index $1 and value is $2.
  next                                               ##next will skip all further statements.
}                                                    ##Closing count==1 BLOCK here.
count==2{                                            ##Checking condition if count==2 then do following.
  a[$1]=a[$1] OFS $2                                 ##Concatenate value of a[$1] to its previous value which it got from file1.csv
  next                                               ##next will skip all further statements from here.
}                                                    ##Closing count==2 BLOCK here.
count==3{                                            ##Checking condition if count==3 then do following.
  print $1,a[$1],$2                                  ##Printing first field, a[$1] value  and $2 of current line for file3.csv
}'  file1.csv file2.csv file3.csv                    ##Mentioning all Input_file(s) names here.