将字符串和连接值与bash shell中的一个字段进行比较

时间:2014-10-13 02:13:45

标签: linux bash awk sed


我正在编写一个脚本来修改csv文件

这是我的csv档案:

  

" ID","主题" ,"频道","说明"
  " 24" ," Bind-0.9.3" ," Linux"," BIND(Berkeley Internet Name Domain)是DNS(域名系统)协议的实现。
  " 24" ," Bind-0.9.3"," Fedora"," BIND(Berkeley Internet Name Domain)是DNS(域名系统)协议的实现"
  " 25" ," Tar-8.0.1" ," Debian"," Tar Package"
  " 25" ," Tar-8.0.1"," Ubuntu" ,"焦油套餐"

现在,我想比较" ID"价值。如果它们具有相同的价值,我们可以加入"频道"进入一个领域

此处的预期结果:

  

" ID","主题" ,"频道","说明"
  " 24" ," Bind-0.9.3" ," Linux,Fedora"," BIND(伯克利互联网域名域名)是DNS(域名系统)协议的实现"
  " 25" ," Tar-8.0.1" ," Debian,Ubuntu"," Tar Package"

在我的情况下,有没有人有想法使用awk,sed或其他东西?
非常感谢 的问候,

2 个答案:

答案 0 :(得分:1)

$ cat tst.awk
BEGIN { FS="[[:space:]]*,[[:space:]]*"; OFS=" , " }
NR==1 { print; next }
{
    subj[$1] = $2
    desc[$1] = $4
    if ($1 in chans) {
        chans[$1] = chans[$1] OFS $3
    }
    else {
        chans[$1] = $3
        cnt2chan[++numChans] = $1
    }
}
END {
    for (chanNr=1; chanNr<=numChans; chanNr++) {
        chan = cnt2chan[chanNr]
        gsub(/\"/,"",chans[chan])
        print chan, subj[chan], "\"" chans[chan] "\"", desc[chan]
    }
}
$
$ awk -f tst.awk file
"ID", "Subject" , "Channels", "Description"
"24" , "Bind-0.9.3" , "Linux , Fedora" , "BIND (Berkeley Internet Name Domain) is an implementation of the DNS (Domain Name System) protocols"
"25" , "Tar-8.0.1" , "Debian , Ubuntu" , "Tar Package"

答案 1 :(得分:0)

这可能适合你(GNU sed):

sed -r ':a;$!N;s/^("[0-9]*")\s*,\s*"[^"]*"\s*,\s*"([^"]*)".*\n(\1\s*,\s*"[^"]*"\s*,\s*")/\3\2,/;ta;P;D' file

在图案空间中保持2行的运行窗口,如果这些行的开头相同,则将通道组合到第二行,删除第一行并重复。

N.B。标题将不会被触及,因为它们没有满足所需的模式。