如何使用awk

时间:2017-03-02 10:49:01

标签: shell csv awk

以下是示例数据。请注意,此操作需要在具有数百万条记录的文件上完成,因此我需要最佳方法。基本上我们希望更新第二列,连接第四列的前两个字符,并排除第二列的前三个字段('_'分隔)。

我一直在尝试使用剪切和逐行读取文件,这非常耗时。我需要一些类似于

的awk
awk -F, '{print $1","substr($4,1,2)"_"cut -f4-6 -d'_'($2)","$3","$4","$5","$6}'

输入数据:

234234234,123_33_3_11111_asdf_asadfas,01,06_1234,4325325432,2
234234234,123_11_2_234111_aadsvfcvxf_anfews,01,07_4444,423425432,2
234234234,123_33_3_11111_mlkvffdg_mlkfgufks,01,08_2342,436876532,2
234234234,123_33_3_11111_qewf_mkhsdf,01,09_68645,43234532,2

输出需要:

234234234,06_11111_asdf_asadfas,01,06_1234,4325325432,2
234234234,07_234111_aadsvfcvxf_anfews,01,07_4444,423425432,2
234234234,08_11111_mlkvffdg_mlkfgufks,01,08_2342,436876532,2
234234234,09_11111_qewf_mkhsdf,01,09_68645,43234532,2

1 个答案:

答案 0 :(得分:2)

您可以使用awk和printf进行行重组

awk -F"[,_]" '{
    printf "%s,%s_%s_%s_%s,%s,%s_%s,%s,%s\n", $1,$9,$5,$6,$7,$8,$9,$10,$11,$12
}' file

你明白了,

234234234,06_11111_asdf_asadfas,01,06_1234,4325325432,2
234234234,07_234111_aadsvfcvxf_anfews,01,07_4444,423425432,2
234234234,08_11111_mlkvffdg_mlkfgufks,01,08_2342,436876532,2
234234234,09_11111_qewf_mkhsdf,01,09_68645,43234532,2