Question

我试图根据列值将大型tsv文件拆分为较小的部分，但我需要在通过拆分创建的每个文件中保留标题。我该怎么办？

我尝试了一些解决方案，但是它们只能解决特定文件的问题

awk -F'\t' 'NR==1 {h=$0};NR>1{print ((!a[$5]++ && !a[$9]++ && !a[$10]++)? h ORS $0 : $0) > "file_first-" $5 "_second-" $9 "_third-" $10 ".tsv"}' file.tsv

我希望每个文件中都有标头，但目前仅在文件中，其中$ 5 $ 9 $ 10的格式为：1 1 1 2 2 2 ...但没有排列。 >

Answer 1

您可能希望具有以下每行逻辑：

calculate output_file
If !header_sent[output_file]
    Print Header to output_file
    set header_sent[output_file]
EndIf
print current line to output_file

以下AWK中的实施。可以通过删除注释，压缩变量名称等将其转换为单行代码。

NR == 1 { header=$0  }
NR > 1 {
    output_file = "file_first-" $5 "_second-" $9 "_third-" $10 ".tsv"
        # Send header, if not sent to this file yet.
    if (!header_sent[output_file] ) {
        print header > output_file
        header_sent[output_file] = 1
    }
        # Print the current line
    print $0 > output_file
}

用多列值拆分csv文件并保留标题

1 个答案: