我有一个单一的txt文件,我想根据* TEXT ID将其拆分成多个文件
例如:单个txt文件看起来像这样
*TEXT 017 01/04/63 PAGE 020
THE ALLIES AFTER NASSAU IN DECEMBER 1960, THE U.S . FIRST
PROPOSED TO HELP NATO DEVELOP ITS OWN NUCLEAR STRIKE FORCE . BUT EUROPE.....
*TEXT 018 01/04/63 PAGE 021
RUSSIA WHO'S IN CHARGE HERE ? IT WAS IN 1954 THAT NIKITA
KHRUSHCHEV LAUNCHED HIS GRANDIOSE " VIRGIN LANDS " GAMBLE . PART OF THE.....
*TEXT 019 01/04/63 PAGE 021
BERLIN ONE LAST RUN HANS WEIDNER HAD BEEN HOPING FOR MONTHS TO
ESCAPE DRAB EAST GERMANY AND MAKE HIS WAY TO THE WEST . THE ODDS WERE
AGAINST HIM, FOR WEIDNER, 40, WAS A....
如何拆分成多个txt文件??
filename:
TEXT017.txt
filename:
TEXT018.txt
filename:
TEXT019.txt
答案 0 :(得分:2)
通过划分新文本ID的开头来将文本文件拆分为行:
import re
raw_string = """*TEXT 017 01/04/63 PAGE 020
THE ALLIES AFTER NASSAU IN DECEMBER 1960, THE U.S . FIRST
PROPOSED TO HELP NATO DEVELOP ITS OWN NUCLEAR STRIKE FORCE . BUT EUROPE.....
*TEXT 018 01/04/63 PAGE 021
RUSSIA WHO'S IN CHARGE HERE ? IT WAS IN 1954 THAT NIKITA
KHRUSHCHEV LAUNCHED HIS GRANDIOSE " VIRGIN LANDS " GAMBLE . PART OF THE.....
*TEXT 019 01/04/63 PAGE 021
BERLIN ONE LAST RUN HANS WEIDNER HAD BEEN HOPING FOR MONTHS TO
ESCAPE DRAB EAST GERMANY AND MAKE HIS WAY TO THE WEST . THE ODDS WERE
AGAINST HIM, FOR WEIDNER, 40, WAS A...."""
split_string = re.split('(.*TEXT .*PAGE \d+)', raw_string)
for item in split_stuff:
print('------')
print(item)
------
*TEXT 017 01/04/63 PAGE 020
------
THE ALLIES AFTER NASSAU IN DECEMBER 1960, THE U.S . FIRST
PROPOSED TO HELP NATO DEVELOP ITS OWN NUCLEAR STRIKE FORCE . BUT EUROPE.....
------
*TEXT 018 01/04/63 PAGE 021
------
RUSSIA WHO'S IN CHARGE HERE ? IT WAS IN 1954 THAT NIKITA
KHRUSHCHEV LAUNCHED HIS GRANDIOSE " VIRGIN LANDS " GAMBLE . PART OF THE.....
------
*TEXT 019 01/04/63 PAGE 021
------
BERLIN ONE LAST RUN HANS WEIDNER HAD BEEN HOPING FOR MONTHS TO
ESCAPE DRAB EAST GERMANY AND MAKE HIS WAY TO THE WEST . THE ODDS WERE
AGAINST HIM, FOR WEIDNER, 40, WAS A....
答案 1 :(得分:2)
import re
raw_string = """*TEXT 017 01/04/63 PAGE 020
THE ALLIES AFTER NASSAU IN DECEMBER 1960, THE U.S . FIRST
PROPOSED TO HELP NATO DEVELOP ITS OWN NUCLEAR STRIKE FORCE . BUT EUROPE.....
*TEXT 018 01/04/63 PAGE 021
RUSSIA WHO'S IN CHARGE HERE ? IT WAS IN 1954 THAT NIKITA
KHRUSHCHEV LAUNCHED HIS GRANDIOSE " VIRGIN LANDS " GAMBLE . PART OF THE.....
*TEXT 019 01/04/63 PAGE 021
BERLIN ONE LAST RUN HANS WEIDNER HAD BEEN HOPING FOR MONTHS TO
ESCAPE DRAB EAST GERMANY AND MAKE HIS WAY TO THE WEST . THE ODDS WERE
AGAINST HIM, FOR WEIDNER, 40, WAS A...."""
split_strings = re.split('\n?(\*TEXT .*)\n', raw_string)
blocks = [s for s in split_strings if s] # filter some blank strings
for i in range(0, len(blocks), 2):
# extract `019` from `*TEXT 019 01/04/63 PAGE 021`
num = re.search('TEXT (\d+)', blocks[i]).group(1)
# save content to `TEXT019.txt`
filename = 'TEXT%s.txt' % num
content = blocks[i+1]
with open(filename, 'w+') as fp:
fp.write(content)