Question

我想删除文本中的字母数字字符。对于前，我有以下文字：

text= I want to remove alphanumeric jhanb562nkk from the text. Remove alphanumeric from all the texts. uhufshfn76429 is very hard to figure out.

预期结果

result=I want to remove alphanumeric from the text. Remove alphanumeric from all the texts.  is very hard to figure out.

我不确定如何使用正则表达式/替换方法将其从文本中删除。

Answer 1

您可以使用以下正则表达式：
[A-Za-z]+[\d]+[\w]*|[\d]+[A-Za-z]+[\w]*

函数调用为：
re.sub(rgx_str, '', text)

请注意，这将在清除字母数字文本的任何地方留下多余的空间。删除此错误的一种简单方法是运行另一个正则表达式进行后处理：
" +"并替换为" "。

Answer 2

目前尚不清楚您是否需要使用正则表达式进行操作，或者对任何解决方案感到满意。如果您不必使用正则表达式，则可以使用列表推导给出答案：

s = 'some con123taminated pure 123 words'
filtered_str = [word for word in s.split() if (all(ch.isdigit() for ch in word) or not any(ch.isdigit() for ch in word))]
filtered_str = ' '.join(filtered_str)

我承认它不容易阅读。但是唯一可能晦涩的地方是all(.) or not any(.)部分。基本上可以确保单词中的所有字符都是数字，或者都不是。

在python中使用正则表达式/替换替换文本中的字母数字

2 个答案: