Question

假设alphabet是一个字符列表。我想删除不属于alphabet的字符串中的所有字符。那么，如何匹配所有这些角色？

编辑：alphabet可以包含任何字符，而不是必需的字母。

编辑2：只是好奇，是否可以使用regexp？

Answer 1

使用字符串库。这里我使用string.ascii_letters，你也可以添加数字。在这种情况下，有效字符为：'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'以及一些额外的（如果需要）：“ - _。（）”

import string
def valid_name(input):
    valid_chars = "-_.() "+string.ascii_letters + string.digits
    return ''.join(c for c in input if c in valid_chars)

Answer 2

你实际上并不需要Regex。您所需要的只是：

# "alphabet" can be any string or list of any characters
alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 
            'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 
            'u', 'v', 'w', 'x', 'y', 'z']

# "oldstr" is your old string
newstr = ''.join([c for c in oldstr if c not in alphabet])

最后，newstr将是一个新字符串，其中只包含不在oldstr中的alphabet字符。以下是演示：

>>> alphabet = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 
...             'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 
...             'u', 'v', 'w', 'x', 'y', 'z']
>>> oldstr = 'abc123'
>>> newstr = ''.join([c for c in oldstr if c not in alphabet])
>>> newstr
'123'
>>>

Answer 3

如果您想使用正则表达式：

使用此正则表达式：[^ a-zA-Z]

这将匹配所有非字母。警告，这也将匹配空白。为避免这种情况，请改用[a-zA-Z \ s]。

更简单的方法：

你根本不需要正则表达式。只需使用接受的字符创建一个字符串，并过滤掉字符串中不在接受字符中的所有字符。例如：

import string #allows you to get a string of all letters easily

your_word = "hello123 this is a test!!!"
accepted_characters = string.lowercase + string.uppercase + " " #you need the whitespace at the end so it doesn't remove spaces
new_word = ""
for letter in your_word:
    if letter in accepted_characters:
        new_word += letter

那会给你“你好，这是一个考验”

超短方法：

这种方法不是最具可读性，但只能在一行中完成。它与上述方法基本相同，但使用list comprehension和join方法将生成的列表转换为字符串。

''.join([letter for letter in your_word if letter in (string.lowercase + string.uppercase + " ")])

Answer 4

这是一个使用str.translate()：

的解决方案，而不是正则表达式

import string

def delete_chars_not_in_alphabet(s, alphabet=string.letters):
    all_chars = string.maketrans('', '')
    all_except_alphabet = all_chars.translate(None, alphabet)
    return s.translate(None, all_except_alphabet)

示例：

>>> delete_chars_not_in_alphabet('<Hello World!>')
'HelloWorld'
>>> delete_chars_not_in_alphabet('foo bar baz', 'abo ')
'oo ba ba'

请注意，如果您反复使用相同的字母调用此字母，则应在函数外部构建all_except_alphabet（并且只能使用一次）以提高效率。

Answer 5

检查re.sub，并使用一个否定的字符类，如'[^ a-d] '或'[^ abcd] '。 http://docs.python.org/2.7/library/re.html

Python正则表达式，如何匹配不属于字母表的字母

5 个答案: