使用带有标头的AWK文件拆分命令

时间:2016-10-06 15:32:09

标签: awk

awk -F "\",\"" 'NR==1 { hdr=$0; next } $10 != prev { prev=text=$10; gsub(/[^[:alnum:]_]/,"",text); $0 = hdr "\n" $0 } { print > ("test."text".batch.csv") }' test.batch1.csv

有一个awk命令工作不正常,它会拆分文件(基于文件中的$ 10列值)并将标题放在每个文件上。 我试图理解命令行,但我不太了解。 感谢是否有人会向我解释每条线路在做什么?

2 个答案:

答案 0 :(得分:0)

由于您没有提供输入样本,因此这是一个简化版本。

假设您要将文件拆分为键值

$ cat file
header
1
2
2
3
3
3

$ awk 'NR==1{header=$0; next}              # save header
    prev!=$1{fn=$1;                        # when value changed, set new file counter,
             prev=$1;                      # save current key value,
             $0=header RS $0}              # and insert header before first record
            {print > FILENAME"."fn}' file  # print records to the file

$ head file.{1..3}
==> file.1 <==
header
1

==> file.2 <==
header
2
2

==> file.3 <==
header
3
3
3

答案 1 :(得分:0)

awk -F "\",\"" '                      # set field separator to ","
NR==1 {                               # pick the header from the first record
    hdr=$0; next                      # and skip to next record
}
$10 != prev {                         # if 10th the field differs from previous
    prev=text=$10                     # prev and text are set equal to 10th field
    gsub(/[^[:alnum:]_]/,"",text)     # remove all but aA-zZ, 0-9, _ from text
    $0 = hdr "\n" $0                  # header preceeds data
}
{                                     # f.ex. ..,"foo/bar_123",... would output
    print > ("test."text".batch.csv") # to file test.foobar_123.batch.csv
}
' test.batch1.csv                     # input file

如果它不像以前那样工作,我首先检查数据文件是否在第10个字段中排序。