在Linux中将相关的数据行分组为单个列

时间:2018-11-07 22:14:35

标签: linux csv awk sed text-processing

我有一个每天都会自动生成的csv文件,其输出类似于以下示例:

"N","3.5",3,"Bob","10/29/17" 
"Y","4.5",5,"Bob","10/11/18" 
"Y","5",6,"Bob","10/28/18" 
"Y","3",1,"Jim", 
"N","4",2,"Jim","09/29/17" 
"N","2.5",4,"Joe","01/26/18"

我需要对文本进行转换,以便按人员将其分组(第四列),并且所有记录都在一行中,并且在列中使用相同的顺序重复:1、2、3、5 。某些单元格可能缺少数据,但必须保留在序列中,以便列对齐。因此,我需要的输出将如下所示:

"Bob","N","3.5",3,"10/29/17","Y","4.5",5,"10/11/18","Y","5",6,"10/28/18"
"Jim","Y","3",1,,"N","4",2,"09/29/17"
"Joe","N","2.5",4,"01/26/18"

我愿意使用sed,awk或几乎所有标准Linux命令来完成此任务。我一直在尝试使用awk,尽管我已经接近了,但我不知道如何完成它。

这是我要关闭的命令。它列出了标题和名称,但没有其他数据:

awk -F"," 'NR==1; NR>1 {a[$4]=a[$4] ? i : ""} END {for (i in a) {print i}}' test2.csv

3 个答案:

答案 0 :(得分:2)

您只需要更多代码

$ awk 'BEGIN {FS=OFS=","} 
             {k=$4; $4=$5; NF--; a[k]=(k in a?a[k] FS $0:$0)} 
       END   {for(k in a) print k,a[k]}' file

"Bob","N","3.5",3,"10/29/17" ,"Y","4.5",5,"10/11/18" ,"Y","5",6,"10/28/18" 
"Jim","Y","3",1, ,"N","4",2,"09/29/17" 
"Joe","N","2.5",4,"01/26/18"

请注意,NF--技巧可能无法在所有awk中使用。

答案 1 :(得分:0)

您是否也可以尝试以下操作,读取Input_file 2次,它将以与Input_file中第4列相同的顺序提供输出。

awk '
BEGIN{
  FS=OFS=","
}
FNR==NR{
  a[$4]=a[$4]?a[$4] OFS $1 OFS $2 OFS $3 OFS $5:$4 OFS $1 OFS $2 OFS $3 OFS $5
  next
}
a[$4]{
  print a[$4]
  delete a[$4]
}
'  Input_file  Input_file

答案 2 :(得分:0)

如果任何CSV值都有逗号的可能性,则建议使用“ CSV感知”工具来获得可靠而直接的解决方案。

一种方法是使用许多容易使用的csv2tsv命令行工具之一。这样,各种优雅的解决方案就成为可能。例如,可以将CSV通过管道传输到csv2tsv,awk和tsv2csv。

这是另一个使用csv2tsv和的解决方案:

csv2tsv < input.csv | jq -Rrn '
  [inputs | split("\t")]
  | group_by(.[3])[]
  | sort_by(.[2])
  | [.[0][3]] + ( map( del(.[3])) | add)
  | @csv
'

这将产生:

"Bob","N","3.5","3","10/29/17 ","Y","4.5","5","10/11/18 ","Y","5","6","10/28/18 "
"Jim","Y","3","1"," ","N","4","2","09/29/17 "
"Joe","N","2.5","4","01/26/18"

修剪多余的空间作为练习:-)