我有一个5列,10,000行的CSV数据文件。我需要为第1列中的每个不同值找到第2列中的最大值和最小值,然后将它们写入新文件。我对grep,awk,head,tail和其他人都相对较新。以下是我们可以称之为temp.csv
的文件的几行1553 1806.345000 20130516-044310 33.800000 -97.110000
1555 2106.947000 20130516-044310 33.470000 -94.620000
1559 2106.947000 20130516-044310 31.460000 -97.260000
1573 1807.591000 20130516-045311 41.150000 -94.020000
1573 2107.911000 20130516-045311 41.120000 -94.020000
1573 2408.994000 20130516-045311 41.120000 -94.050000
1573 2709.545000 20130516-045311 41.090000 -94.020000
1573 3010.308000 20130516-045311 41.090000 -94.020000
1573 3310.988000 20130516-045311 41.090000 -93.990000
1573 3611.129000 20130516-045311 41.120000 -93.960000
1573 3912.392000 20130516-045311 41.090000 -93.960000
1585 1806.756000 20130516-045812 31.040000 -98.880000
1585 2107.839000 20130516-045812 31.040000 -98.850000
1585 2408.390000 20130516-045812 31.010000 -98.820000
1585 2709.153000 20130516-045812 31.010000 -98.790000
1611 1804.813000 20130516-051316 31.280000 -97.800000
例如,根据这些数据,我希望输出看起来像:
1553 1806.345000 20130516-044310 33.800000 -97.110000
1555 2106.947000 20130516-044310 33.470000 -94.620000
1559 2106.947000 20130516-044310 31.460000 -97.260000
1573 1807.591000 20130516-045311 41.150000 -94.020000
1573 3912.392000 20130516-045311 41.090000 -93.960000
1585 1806.756000 20130516-045812 31.040000 -98.880000
1585 2709.153000 20130516-045812 31.010000 -98.790000
1611 1804.813000 20130516-051316 31.280000 -97.800000
第一行中的某些数字只有一个条目,其中显然是最大和最小的。任何帮助将不胜感激。
答案 0 :(得分:3)
这是完成任务的一种方法。如果数据是否排序,它并不关心:
awk '
$1 in keys {
map["min",$1] = (keys[$1] < $2 ? map["min",$1] : $0);
map["max",$1] = (keys[$1] > $2 ? map["max",$1] : $0);
}
NF {
keys[$1] = $2;
}
!seen[$1]++ {
map["min",$1] = $0;
map["max",$1] = $0;
}
END {
for (key in keys) {
if (map["min",key] == map["max",key]) {
print map["min",key]
}
else {
print map["min",key]
print map["max",key]
}
}
}' file
1611 1804.813000 20130516-051316 31.280000 -97.800000
1585 1806.756000 20130516-045812 31.040000 -98.880000
1585 2709.153000 20130516-045812 31.010000 -98.790000
1553 1806.345000 20130516-044310 33.800000 -97.110000
1555 2106.947000 20130516-044310 33.470000 -94.620000
1559 2106.947000 20130516-044310 31.460000 -97.260000
1573 1807.591000 20130516-045311 41.150000 -94.020000
1573 3912.392000 20130516-045311 41.090000 -93.960000
答案 1 :(得分:0)
awk '!NF { next }
!a[$1]++ { if (length(p)) print p; print; p = ""; next }
{ p = $0 } END { if (length(p)) print p }' file
输出:
1553 1806.345000 20130516-044310 33.800000 -97.110000
1555 2106.947000 20130516-044310 33.470000 -94.620000
1559 2106.947000 20130516-044310 31.460000 -97.260000
1573 1807.591000 20130516-045311 41.150000 -94.020000
1573 3912.392000 20130516-045311 41.090000 -93.960000
1585 1806.756000 20130516-045812 31.040000 -98.880000
1585 2709.153000 20130516-045812 31.010000 -98.790000
1611 1804.813000 20130516-051316 31.280000 -97.800000