基于一条线将一个大文件拆分成较小的文件

时间:2016-07-26 13:08:33

标签: python bash shell split

我有一个非常大的文件(超过20GB),我想把它分成更小的文件,比如2GB的多个文件。

有一点是我必须在特定行之前拆分:

我使用的是Python,但如果在shell中有其他解决方案,我就是为了它。

这就是大文件的样子:

bigfile.txt(20GB)

Recno:: 0
some data...

Recno:: 1
some data...

Recno:: 2
some data...

Recno:: 3
some data...

Recno:: 4
some data...

Recno:: 5
some data...

Recno:: x
some more data...

这就是我想要的:

file1.txt(2 GB +/-)

Recno::0
some data...

Recno:: 1
some data...

file2.txt(2GB +/-)

Recno:: 2
some data...

Recno:: 4
some data...

Recno:: 5
some data...

依此类推,等等......

谢谢!

2 个答案:

答案 0 :(得分:1)

你可以这样做:

import sys

try:
    _, size, file = sys.argv
    size = int(size)
except ValueError:
    sys.exit('Usage: splitter.py <size in bytes> <filename to split>')

with open(file) as infile:
    count = 0
    current_size = 0
    # you could do something more
    # fancy with the name like use
    # os.path.splitext
    outfile = open(file+'_0', 'w+')
    for line in infile:
        if current_size > size and line.startswith('Recno'):
            outfile.close()
            count += 1
            current_size = 0
            outfile = open(file+'_{}'.format(count), 'w+')
        current_size += len(line)
        outfile.write(line)
    outfile.close()

答案 1 :(得分:-1)

如上所述,您可以在bash shell中使用split

split -b 20000m <path-to-your-file>