Question

我有一个大文本文件，我希望根据列的不同值将其分块为较小的文件，列用逗号分隔（它是一个csv文件）并且有许多不同的值：

e.g。

1012739937,2006-11-28,d_02245211
1012739937,2006-11-28,d_02238545
1012739937,2006-11-28,d_02236564
1012739937,2006-11-28,d_01918338
1012739937,2006-11-28,d_02148765
1012739937,2006-11-28,d_00868949
1012739937,2006-11-28,d_01908448
1012740478,1998-06-26,d_01913689
1012740478,1998-06-26,i_4869
1012740478,1998-06-26,d_02174766

我想将文件分块为较小的文件，以便每个文件包含属于一年的记录（一个用于2006年的记录，一个用于1998年的记录等）

（这里我们的年数可能有限，但我想对特定列的大量不同值进行相同的处理）

Answer 1

您可以使用awk：

awk -F, '{split($2,d,"-");print > d[1]}' file

说明：

-F,              tells awk that input fields are separated by ','

split($2,d,"-")  splits the second column (the date) by '-'
                 and puts the bits into the array 'd'

print > d[1]     prints the whole input line into a file named after the year

Answer 2

快速awk解决方案，如果稍微脆弱（假设第二列，如果存在，则始终启动yyyy）

awk -F, '$2{print > (substr($2,0,4) ".csv")}' test.in

它会将输入拆分为文件yyyy.csv;确保它们不存在于您当前的目录中，否则它们将被覆盖。

Answer 3

另一个awk：使用稍微复杂的字段分隔符：

awk -F '[,-]' '{print > $2}' file

基于正则表达式（LInux）的大文件块

3 个答案: