根据内容拆分.text文件

时间:2011-12-17 12:18:42

标签: bash sed awk

我有一个巨大的*.txt文件,如下所示:

~~~~~~~~ small file content 1 <br>
~~~~~~~~ small file content 2 <br>
...
~~~~~~~~ small file content n <br>

如何将其拆分为n个文件,最好是通过bash

3 个答案:

答案 0 :(得分:13)

使用csplit

$ csplit --help
Usage: csplit [OPTION]... FILE PATTERN...
Output pieces of FILE separated by PATTERN(s) to files `xx00', `xx01', ...,
and output byte counts of each piece to standard output.

答案 1 :(得分:0)

使用awk:

awk 'BEGIN {c=1} NR % 10000 == 0 { c++ } { print $0 > ("splitfile_" c) }' LARGEFILE

会做的。它设置一个计数器,每10000行增加一个计数器。然后将行写入˙splitfile_`文件。

HTH

答案 2 :(得分:0)

如果您的HUGE文本文件的内容在每一行上(即每行包含您要分割的内容,那么这应该有效) -

<强>一衬垫:

awk '{print >("SMALL_BATCH_OF_FILES_" NR)}' BIG_FILE

<强>测试

[jaypal:~/Temp] cat BIG_FILE
~~~~~~~~ small file content 1
~~~~~~~~ small file content 2
~~~~~~~~ small file content 3
~~~~~~~~ small file content 4
~~~~~~~~ small file content n-1
~~~~~~~~ small file content n

[jaypal:~/Temp] awk '{print >("SMALL_BATCH_OF_FILES_" NR)}' BIG_FILE

[jaypal:~/Temp] ls -lrt SMALL_BATCH_OF_FILES_*
-rw-r--r--  1 jaypalsingh  staff  30 17 Dec 14:19 SMALL_BATCH_OF_FILES_6
-rw-r--r--  1 jaypalsingh  staff  32 17 Dec 14:19 SMALL_BATCH_OF_FILES_5
-rw-r--r--  1 jaypalsingh  staff  30 17 Dec 14:19 SMALL_BATCH_OF_FILES_4
-rw-r--r--  1 jaypalsingh  staff  30 17 Dec 14:19 SMALL_BATCH_OF_FILES_3
-rw-r--r--  1 jaypalsingh  staff  30 17 Dec 14:19 SMALL_BATCH_OF_FILES_2
-rw-r--r--  1 jaypalsingh  staff  30 17 Dec 14:19 SMALL_BATCH_OF_FILES_1

[jaypal:~/Temp] cat SMALL_BATCH_OF_FILES_1 
~~~~~~~~ small file content 1
[jaypal:~/Temp] cat SMALL_BATCH_OF_FILES_2 
~~~~~~~~ small file content 2
[jaypal:~/Temp] cat SMALL_BATCH_OF_FILES_6
~~~~~~~~ small file content n