如何基于多个列值拆分文件

时间:2017-02-03 17:18:53

标签: unix awk split multiple-columns aix

我需要接受这个test_file&将它拆分为col5和col6的每个唯一组合的单独文件。 另一个问题是此文件需要在150,000条记录后拆分。此外,命名对话也需要从文件中提取: “$ 5”_“$ 6”_P“sysdate”_IU“$ 4” 60“” [File#]“。zip

test_file.csv

col1, col2, col3, col4, col5, col6 ..... col32
1234, 6789, 1, 01/31/2017 00:00:00, 1000, 1234 ..... col32
1235, 1233, 1, 01/31/2017 00:00:00, 1000, 1234 ..... col32
1236, 4423, 1, 01/31/2017 00:00:00, 1000, 5678 ..... col32
1237, 3323, 1, 01/31/2017 00:00:00, 1000, 1234 ..... col32
1238, 0808, 1, 01/31/2017 00:00:00, 1000, 1234 ..... col32
1239, 2222, 1, 01/31/2017 00:00:00, 2000, 1234 ..... col32
1231, 4535, 1, 01/31/2017 00:00:00, 2000, 1234 ..... col32
1232, 8080, 1, 01/31/2017 00:00:00, 2000, 5678 ..... col32
1233, 7878, 1, 01/31/2017 00:00:00, 2000, 5678 ..... col32

结果应该如下:

1000_1234_P20170203_IU20170131_60_1.ZIP
col1, col2, col3, col4, col5, col6 ..... col32
1234, 6789, 1, 01/31/2017 00:00:00, 1000, 1234 ..... col32
1235, 1233, 1, 01/31/2017 00:00:00, 1000, 1234 ..... col32
1237, 3323, 1, 01/31/2017 00:00:00, 1000, 1234 ..... col32
1238, 0808, 1, 01/31/2017 00:00:00, 1000, 1234 ..... col32

1000_5678_P20170203_IU20170131_60_1.ZIP
col1, col2, col3, col4, col5, col6 ..... col32
1236, 4423, 1, 01/31/2017 00:00:00, 1000, 5678 ..... col32

2000_1234_P20170203_IU20170131_60_1.ZIP
col1, col2, col3, col4, col5, col6 ..... col32
1239, 2222, 1, 01/31/2017 00:00:00, 2000, 1234 ..... col32
1231, 4535, 1, 01/31/2017 00:00:00, 2000, 1234 ..... col32

2000_5678_P20170203_IU20170131_60_1.ZIP
col1, col2, col3, col4, col5, col6 ..... col32
1232, 8080, 1, 01/31/2017 00:00:00, 2000, 5678 ..... col32
1233, 7878, 1, 01/31/2017 00:00:00, 2000, 5678 ..... col32

1 个答案:

答案 0 :(得分:2)

从这开始:

awk -F', *' -v sysdate="$(date +'%Y%m%d')" '
NR==1 { hdr = $0; next }
(cnt[$5,$6]++ % 150000) == 0 { sfx[$5,$6]++ }
{
    split($4,d,/[\/ ]/)
    out = $5 "_" $6 "_P" sysdate "_IU" d[3] d[1] d[2] "_60_" sfx[$5,$6] ".zip"
    if (!seen[out]++) {
        print hdr > out
    }
    print > out
}
' file

按摩以适应。如果您不使用GNU awk,则可能需要close()文件,以避免“打开太多文件”错误。