好的正则表达式大师,我有一个很长的文本,我试图在句子中添加引号,其中包含“他说的”和类似的变体。
例如:
s = 'This should have no quotes. This one should he said. But this one should not. Neither should this. But this one should she said.'
应该导致:
This should have no quotes. "This one should," he said. But this one should not. Neither should this. "But this one should," she said.
到目前为止,我可以非常接近,但不太正确:
>>> import re
>>> m = re.sub(r'\.\W(.*?) (he|she|it) said.', r'. "\1," \2 said.', s)
结果:
>>> print m
This should have no quotes. "This one should," he said. But this one should not. "Neither should this. But this one should," she said.
正如你所看到的,它在第一个实例周围正确地引用了引用,但是对于第二个实例来说它太早了。任何帮助表示赞赏!
答案 0 :(得分:2)
评论中指出了一些不同的有效情况,但要解决您所面临的问题:
它引用了整个句子,因为它看到one should not.
末尾的句号。你真正想要的是,只引用 last 期间。因此,在匹配的括号中,请确保不包括句点,如下所示:
m = re.sub(r'\.\W([^\.]*?) (he|she|it) said.', r'. "\1," \2 said.', s)
对于像"Dr. Seuss likes to eat, she said"
这样的句子中的句点,这会失败,但这是另一个问题。