我正在编写一个脚本来修改csv文件
这是我的csv档案:
" ID","主题" ,"频道","说明"
" 24" ," Bind-0.9.3" ," Linux"," BIND(Berkeley Internet Name Domain)是DNS(域名系统)协议的实现。
" 24" ," Bind-0.9.3"," Fedora"," BIND(Berkeley Internet Name Domain)是DNS(域名系统)协议的实现"
" 25" ," Tar-8.0.1" ," Debian"," Tar Package"
" 25" ," Tar-8.0.1"," Ubuntu" ,"焦油套餐"
现在,我想比较" ID"价值。如果它们具有相同的价值,我们可以加入"频道"进入一个领域
此处的预期结果:
" ID","主题" ,"频道","说明"
" 24" ," Bind-0.9.3" ," Linux,Fedora"," BIND(伯克利互联网域名域名)是DNS(域名系统)协议的实现"
" 25" ," Tar-8.0.1" ," Debian,Ubuntu"," Tar Package"
在我的情况下,有没有人有想法使用awk,sed或其他东西?
非常感谢
的问候,
答案 0 :(得分:1)
$ cat tst.awk
BEGIN { FS="[[:space:]]*,[[:space:]]*"; OFS=" , " }
NR==1 { print; next }
{
subj[$1] = $2
desc[$1] = $4
if ($1 in chans) {
chans[$1] = chans[$1] OFS $3
}
else {
chans[$1] = $3
cnt2chan[++numChans] = $1
}
}
END {
for (chanNr=1; chanNr<=numChans; chanNr++) {
chan = cnt2chan[chanNr]
gsub(/\"/,"",chans[chan])
print chan, subj[chan], "\"" chans[chan] "\"", desc[chan]
}
}
$
$ awk -f tst.awk file
"ID", "Subject" , "Channels", "Description"
"24" , "Bind-0.9.3" , "Linux , Fedora" , "BIND (Berkeley Internet Name Domain) is an implementation of the DNS (Domain Name System) protocols"
"25" , "Tar-8.0.1" , "Debian , Ubuntu" , "Tar Package"
答案 1 :(得分:0)
这可能适合你(GNU sed):
sed -r ':a;$!N;s/^("[0-9]*")\s*,\s*"[^"]*"\s*,\s*"([^"]*)".*\n(\1\s*,\s*"[^"]*"\s*,\s*")/\3\2,/;ta;P;D' file
在图案空间中保持2行的运行窗口,如果这些行的开头相同,则将通道组合到第二行,删除第一行并重复。
N.B。标题将不会被触及,因为它们没有满足所需的模式。