我有一个非常大的文件(超过20GB),我想把它分成更小的文件,比如2GB的多个文件。
有一点是我必须在特定行之前拆分:
我使用的是Python,但如果在shell中有其他解决方案,我就是为了它。
这就是大文件的样子:
bigfile.txt
(20GB)
Recno:: 0
some data...
Recno:: 1
some data...
Recno:: 2
some data...
Recno:: 3
some data...
Recno:: 4
some data...
Recno:: 5
some data...
Recno:: x
some more data...
这就是我想要的:
file1.txt
(2 GB +/-)
Recno::0
some data...
Recno:: 1
some data...
file2.txt
(2GB +/-)
Recno:: 2
some data...
Recno:: 4
some data...
Recno:: 5
some data...
依此类推,等等......
谢谢!
答案 0 :(得分:1)
你可以这样做:
import sys
try:
_, size, file = sys.argv
size = int(size)
except ValueError:
sys.exit('Usage: splitter.py <size in bytes> <filename to split>')
with open(file) as infile:
count = 0
current_size = 0
# you could do something more
# fancy with the name like use
# os.path.splitext
outfile = open(file+'_0', 'w+')
for line in infile:
if current_size > size and line.startswith('Recno'):
outfile.close()
count += 1
current_size = 0
outfile = open(file+'_{}'.format(count), 'w+')
current_size += len(line)
outfile.write(line)
outfile.close()
答案 1 :(得分:-1)
如上所述,您可以在bash shell中使用split
:
split -b 20000m <path-to-your-file>