我抓了几篇关于恐怖袭击的文章。从这些文章中我想提取一个特定的段落。
这是一篇文章的样本:
By DAVID D. KIRKPATRICK MARCH 18, 2015
Scenes from Tunisian state television showed confusion outside an art museum and Parliament on Wednesday after gunmen attacked.
CAIRO — Gunmen in military uniforms killed 19 people on Wednesday in a
midday attack on a museum in downtown Tunis, dealing a new blow to the tourist industry
that is vital to Tunisia as it struggles to consolidate the only transition to democracy
after the Arab Spring revolts.
Tunisian officials had initially said that the attackers took 10
hostages and killed nine people, including seven foreign visitors and two Tunisians.
我想要提取以供进一步分析的是,在本示例中,文本从:“CAIRO - ”到第一个fullstop。
我想出了This is the regular expression:
([A-Z]+(?:\W+\w+)?)\s*—[\s\S]+\.\s
使用这个正则表达式,我只提取段落的起点,但我不提取其余部分。