Question

我需要一个python正则表达式，它可以帮助我消除单词中的非法字符。

条件如下：

测试数据：

 s = "there is' -potato 'all' around- 'the 'farm-"

预期产出：

>>>print(s)
there is' potato all' around the farm

我的代码目前正是如此，但它无法正常运行：

newLine = re.findall(r'[a-z][-\'a-z]*[\'a-z]?', s)

非常感谢任何帮助！谢谢！

Answer 1

只匹配您不想要的字符并通过re.sub

删除ith

>>> import re
>>> s = """potato
-potato
'human'
potatoes-"""
>>> m = re.sub(r"(?m)^['-]|-$", r'', s)
>>> print(m)
potato
potato
human'
potatoes

或

>>> m = re.sub(r"(?m)^(['-])?([a-z'-]*?)-?$", r'\2', s) >>> print(m) potato potato human' potatoes

DEMO

Answer 2

您可以尝试：

[a-z][a-z'\-]*[a-z]|[a-z]

Answer 3

试试这个：

>>> b=re.findall(r'[a-z][-\'a-z]*[\'a-z]',a)
>>> for i in b: print i
... 
potato
potato
human'
potatoes

Answer 4

假设每个单词都用空格分隔，您可以找到this regex之类的所有有效单词：

(?<= |^)[a-z](?:(?:[\-\'a-z]+)?[\'a-z])?(?= |$)

但是如果你想消除非法字符，我猜你最好找到非法字符并删除它们。现在我们再次假设您有一个字符串，该字符串应该只包含由空格分隔的单词，而不包含任何其他单词。

首先，我们可以将所有无效字符从字符串中分出：[^a-z-' ]

执行此操作后，唯一可能仍然无效的内容是单词开头的'或-或单词末尾的-。

因此，我们将这些内容与this regex：(?<= |^)['-]+|-+(?= |$)

分开