Question

我试图编写一个简单的程序，删除包含接收字符串中数字的所有单词。

这是我目前的实施：

import re

def checkio(text):

    text = text.replace(",", " ").replace(".", " ") .replace("!", " ").replace("?", " ").lower()
    counter = 0
    words = text.split()

    print words

    for each in words:
        if bool(re.search(r'\d', each)):
            words.remove(each)

    print words

checkio("1a4 4ad, d89dfsfaj.")

然而，当我执行这个程序时，我得到以下输出：

['1a4', '4ad', 'd89dfsfaj']
['4ad']

我无法弄清楚为什么'4ad'在第二行打印，因为它包含数字，应该已从列表中删除。有什么想法吗？

Answer 1

如果您正在测试字母数字字符串，为什么不使用isalnum()而不是正则表达式？

In [1695]: x = ['1a4', '4ad', 'd89dfsfaj']

In [1696]: [word for word in x if not word.isalnum()]
Out[1696]: []

Answer 2

假设您的正则表达式符合您的要求，您可以执行此操作以避免在迭代时删除。

import re

def checkio(text):

    text = re.sub('[,\.\?\!]', ' ', text).lower()
    words = [w for w in text.split() if not re.search(r'\d', w)]
    print words ## prints [] in this case

另请注意，我简化了您的text = text.replace(...)行。

此外，如果您不需要重复使用text变量，则可以使用正则表达式直接拆分它。

import re

def checkio(text):

    words = [w for w in re.split('[,.?!]', text.lower()) if w and not re.search(r'\d', w)]
    print words ## prints [] in this case

Answer 3

显然，发生的是并发访问错误。即 - 您在遍历数组时删除元素。

在第一次迭代中，我们有单词= [＆＃39; 1a4＆＃39;，＆＃39; 4ad＆＃39;，＆＃39; d89dfsfaj＆＃39;]。因为＆＃39; 1a4＆＃39;有一个数字，我们删除它。现在，单词= [＆＃39; 4ad＆＃39;，＆＃39; d89dfsfaj＆＃39;]。但是，在第二次迭代中，当前的单词现在是＆＃39; d89dfsfaj＆＃39;我们删除它。发生的事情是我们跳过＆＃39; 4ad＆＃39;，因为它现在在索引0处且for循环的当前指针为1。

从给定字符串中删除包含数字的单词

3 个答案: