如何通过Python将单个txt文件拆分为多个txt文件

时间:2016-11-24 04:06:53

标签: python

我有一个单一的txt文件,我想根据* TEXT ID将其拆分成多个文件

例如:单个txt文件看起来像这样

*TEXT 017 01/04/63 PAGE 020
THE ALLIES AFTER NASSAU IN DECEMBER 1960, THE U.S . FIRST
PROPOSED TO HELP NATO DEVELOP ITS OWN NUCLEAR STRIKE FORCE . BUT EUROPE.....
*TEXT 018 01/04/63 PAGE 021
RUSSIA WHO'S IN CHARGE HERE ? IT WAS IN 1954 THAT NIKITA
KHRUSHCHEV LAUNCHED HIS GRANDIOSE " VIRGIN LANDS " GAMBLE . PART OF THE.....
*TEXT 019 01/04/63 PAGE 021
BERLIN ONE LAST RUN HANS WEIDNER HAD BEEN HOPING FOR MONTHS TO
ESCAPE DRAB EAST GERMANY AND MAKE HIS WAY TO THE WEST . THE ODDS WERE
AGAINST HIM, FOR WEIDNER, 40, WAS A....

如何拆分成多个txt文件??

filename:
TEXT017.txt

filename:
TEXT018.txt

filename:
TEXT019.txt

2 个答案:

答案 0 :(得分:2)

通过划分新文本ID的开头来将文本文件拆分为行:

import re

raw_string = """*TEXT 017 01/04/63 PAGE 020
THE ALLIES AFTER NASSAU IN DECEMBER 1960, THE U.S . FIRST
PROPOSED TO HELP NATO DEVELOP ITS OWN NUCLEAR STRIKE FORCE . BUT EUROPE.....
*TEXT 018 01/04/63 PAGE 021
RUSSIA WHO'S IN CHARGE HERE ? IT WAS IN 1954 THAT NIKITA
KHRUSHCHEV LAUNCHED HIS GRANDIOSE " VIRGIN LANDS " GAMBLE . PART OF THE.....
*TEXT 019 01/04/63 PAGE 021
BERLIN ONE LAST RUN HANS WEIDNER HAD BEEN HOPING FOR MONTHS TO
ESCAPE DRAB EAST GERMANY AND MAKE HIS WAY TO THE WEST . THE ODDS WERE
AGAINST HIM, FOR WEIDNER, 40, WAS A...."""

split_string = re.split('(.*TEXT .*PAGE \d+)', raw_string)
for item in split_stuff:
    print('------')
    print(item)

------
*TEXT 017 01/04/63 PAGE 020
------

THE ALLIES AFTER NASSAU IN DECEMBER 1960, THE U.S . FIRST
PROPOSED TO HELP NATO DEVELOP ITS OWN NUCLEAR STRIKE FORCE . BUT EUROPE.....

------
*TEXT 018 01/04/63 PAGE 021
------

RUSSIA WHO'S IN CHARGE HERE ? IT WAS IN 1954 THAT NIKITA
KHRUSHCHEV LAUNCHED HIS GRANDIOSE " VIRGIN LANDS " GAMBLE . PART OF THE.....

------
*TEXT 019 01/04/63 PAGE 021
------

BERLIN ONE LAST RUN HANS WEIDNER HAD BEEN HOPING FOR MONTHS TO
ESCAPE DRAB EAST GERMANY AND MAKE HIS WAY TO THE WEST . THE ODDS WERE
AGAINST HIM, FOR WEIDNER, 40, WAS A....

答案 1 :(得分:2)

受@ n1c9的启发,我修改并添加了一些内容以使其完成。

import re

raw_string = """*TEXT 017 01/04/63 PAGE 020
THE ALLIES AFTER NASSAU IN DECEMBER 1960, THE U.S . FIRST
PROPOSED TO HELP NATO DEVELOP ITS OWN NUCLEAR STRIKE FORCE . BUT EUROPE.....
*TEXT 018 01/04/63 PAGE 021
RUSSIA WHO'S IN CHARGE HERE ? IT WAS IN 1954 THAT NIKITA
KHRUSHCHEV LAUNCHED HIS GRANDIOSE " VIRGIN LANDS " GAMBLE . PART OF THE.....
*TEXT 019 01/04/63 PAGE 021
BERLIN ONE LAST RUN HANS WEIDNER HAD BEEN HOPING FOR MONTHS TO
ESCAPE DRAB EAST GERMANY AND MAKE HIS WAY TO THE WEST . THE ODDS WERE
AGAINST HIM, FOR WEIDNER, 40, WAS A...."""

split_strings = re.split('\n?(\*TEXT .*)\n', raw_string)
blocks = [s for s in split_strings if s] # filter some blank strings

for i in range(0, len(blocks), 2):
    # extract `019` from `*TEXT 019 01/04/63 PAGE 021`
    num = re.search('TEXT (\d+)', blocks[i]).group(1)

    # save content to `TEXT019.txt`
    filename = 'TEXT%s.txt' % num
    content = blocks[i+1]
    with open(filename, 'w+') as fp:
        fp.write(content)