我有一个csv文件,我在使用cut命令后改变了它的一些列。
123;bbb ;10.01.2010
456;ddd;11.01.2015
789;aaa;20.12.2010
222;ccc;15.10.2010
作为一个例子,我得到第二列,修剪并对下面的代码进行排序;
cut -f 2 -d ';' data.csv | sed 's/^[ \t]*//;s/[ \t]*$//' | sort
如何使用新值覆盖文件列,以便同一文件变为如下所示?
123;aaa;10.01.2010
456;bbb;11.01.2015
789;ccc;20.12.2010
222;ddd;15.10.2010
答案 0 :(得分:2)
<强>输入强>
$ cat f
123;bbb ;10.01.2010
456;ddd;11.01.2015
789;aaa;20.12.2010
222;ccc;15.10.2010
使用cut, tr, sort and paste
$ paste -d ';' <(cut -f 1 -d ';' f) <(cut -f 2 -d ';' f | tr -d ' ' | sort) <(cut -f 3 -d ';' f | sort)
123;aaa;10.01.2010
456;bbb;11.01.2015
789;ccc;15.10.2010
222;ddd;20.12.2010
使用cut, tr, sort and pr
$ pr -mtJs';' <(cut -f 1 -d ';' f) <(cut -f 2 -d ';' f | tr -d ' ' | sort) <(cut -f 3 -d ';' f | sort)
123;aaa;10.01.2010
456;bbb;11.01.2015
789;ccc;15.10.2010
222;ddd;20.12.2010
使用gawk
(推荐使用)
$ awk 'BEGIN{FS=OFS=";"}FNR==NR{sub(/ +/,"",$2);a[$2];next}FNR==1{asorti(a,b)}{$2=b[FNR]}1' f f
123;aaa;10.01.2010
456;bbb;11.01.2015
789;ccc;20.12.2010
222;ddd;15.10.2010
说明(两次读取同一文件)
awk '# START SCRIPT
BEGIN{
FS=OFS=";" # Set input and output field separator
}
# IF the number of records read so far across all files is equal
# to the number of records read so far in the current file, a
# condition which can only be true for the first file read, THEN
FNR==NR{
# Trim space char of field2
sub(/ +/,"",$2)
# populate array "a" such that the value indexed by the field2
a[$2]
# Move on to the next record so we do not do any processing intended
# for records from the second file.
next
}
# When we read first record of same file read second time then
FNR==1{
# asorti() sorts based on keys (or indexes, or indices, hence the "i")
asorti(a,b)
}
{
# replace field to value with array value
$2=b[FNR]
}1 # }1 at the end does default operation print $0
' f f # input same file twice