在句子周围加上“说”这个词的引号

时间:2013-11-07 00:43:27

标签: python regex text

好的正则表达式大师,我有一个很长的文本,我试图在句子中添加引号,其中包含“他说的”和类似的变体。

例如:

s = 'This should have no quotes. This one should he said. But this one should not. Neither should this. But this one should she said.'

应该导致:

This should have no quotes. "This one should," he said. But this one should not. Neither should this. "But this one should," she said.

到目前为止,我可以非常接近,但不太正确:

>>> import re
>>> m = re.sub(r'\.\W(.*?) (he|she|it) said.', r'. "\1," \2 said.', s)

结果:

>>> print m
This should have no quotes. "This one should," he said. But this one should not. "Neither should this. But this one should," she said.

正如你所看到的,它在第一个实例周围正确地引用了引用,但是对于第二个实例来说它太早了。任何帮助表示赞赏!

1 个答案:

答案 0 :(得分:2)

评论中指出了一些不同的有效情况,但要解决您所面临的问题:

它引用了整个句子,因为它看到one should not.末尾的句号。你真正想要的是,只引用 last 期间。因此,在匹配的括号中,请确保不包括句点,如下所示:

m = re.sub(r'\.\W([^\.]*?) (he|she|it) said.', r'. "\1," \2 said.', s)

对于像"Dr. Seuss likes to eat, she said"这样的句子中的句点,这会失败,但这是另一个问题。