我有一个看起来像这样的文件:
194170,46.9,42.2
194170,47.7,40.0
194170,48.5,42.0
194170,48.6,43.0
194170,49.8,39.2
194170,50.2,43.3
194179,44.9,36.9
194179,45.3,36.3
194179,46.4,36.9
194179,47.5,34.4
194179,48.0,40.0
194179,49.6,37.1
194184,52.8,51.1
194184,52.9,49.8
194184,54.0,51.9
194184,56.8,54.9
194184,57.6,53.6
194184,57.8,52.9
...
对于给定的行,第一个数字是ID,第二个和第三个数字是我感兴趣的。对于具有相同ID的行(即每六行),相同的数字列是连续年份的数字。我想最终得到一个看起来像这样的文件:
194170,46.9,47.7,48.5,48.6,49.8,50.2
194170,42.2,40.0,42.0,43.0,39.2,43.3
194179,44.9,45.3,46.4,47.5,48.0,49.6
194179,36.9,36.3,36.9,34.4,40.0,37.1
也就是说,对于具有相同ID的行,我想将第二列中的连续数字组合在一起,同样将第三列分组。
这可能与awk / sed / others有关吗?
答案 0 :(得分:1)
awk的另一个答案:
awk -F, '{a[$1] = a[$1]","$2}END{for(i in a) print i a[i]}' yourfile
对于两列:
awk -F, '{a[$1] = a[$1]","$2;b[$1] = b[$1]","$3}END{for(i in a) print i a[i]"\n"i b[i]}' yourfile
无论如何,我更喜欢R中的tidyR来完成这项任务。
答案 1 :(得分:0)
使用awk
:
awk -F',' '{ a[$1] = a[$1] ? a[$1] FS $2 : $2 ; b[$1] = b[$1] ? b[$1] FS $3 : $3}
END { for(idx in a){ print idx,a[idx] ; print idx,b[idx]}}' yourfile
说明:
-F
字段分隔符a[]
将有第二列值b[]
将有第三列值END{}
打印值示例:
$ awk -F',' '{ a[$1] = a[$1] ? a[$1] FS $2 : $2 ; b[$1] = b[$1] ? b[$1] FS $3 : $3}
END { for(idx in a){ print idx,a[idx] ; print idx,b[idx]}}' yourfile
194170 46.9,47.7,48.5,48.6,49.8,50.2
194170 42.2,40.0,42.0,43.0,39.2,43.3
194184 52.8,52.9,54.0,56.8,57.6,57.8
194184 51.1,49.8,51.9,54.9,53.6,52.9
194179 44.9,45.3,46.4,47.5,48.0,49.6
194179 36.9,36.3,36.9,34.4,40.0,37.1
答案 2 :(得分:0)
另一个没有使用数组并保持原始顺序的awk版本(如果它是一个非常大的文件而不是使用数组,那么你不想加载所有数据在打印之前进入内存 - 否则,阵列版本很好,假设您不关心订购)。
BEGIN { FS = OFS = "," }
!prev_id { prev_id = $1 }
$1 == prev_id { r1 = r1 OFS $2; r2 = r2 OFS $3 }
$1 != prev_id { print prev_id r1 ORS prev_id r2;
r1 = OFS $2; r2 = OFS $3; prev_id = $1 }
END { print prev_id r1 ORS prev_id r2 }
$ awk -f v3.awk file.txt
194170,46.9,47.7,48.5,48.6,49.8,50.2
194170,42.2,40.0,42.0,43.0,39.2,43.3
194179,44.9,45.3,46.4,47.5,48.0,49.6
194179,36.9,36.3,36.9,34.4,40.0,37.1
194184,52.8,52.9,54.0,56.8,57.6,57.8
194184,51.1,49.8,51.9,54.9,53.6,52.9