Question

我想使用正则表达式根据一组元组过滤用户输入。如果在set of tuples中找不到用户输入且不是an alphanumeric character，则应返回错误消息。我不知道如何在我的python正则表达式代码中访问元组。所以我传入src.items()，如何使用转义功能让src.items()引入其值，或者我不应该这样做。

我的代码：

import re

direction = ('north', 'south', 'east', 'west', 'down', 'up', 'left', 'right', 'back')
verb = ('go', 'stop', 'kill', 'eat')
stop = ('the', 'in', 'of', 'from', 'at', 'it')
noun = ('door', 'bear', 'princess', 'cabinet')    

src = {'direction': direction,
       'verb': verb,
       'stop': stop,
       'noun': noun
       }

# use this to pick out error strings from user input
    er = r"*[\W | src.items()]"
    ep = re.compile(er, re.IGNORECASE)

Answer 1

首先，这里有一个冗余：

如果未找到用户输入，则应返回错误消息元组集并且不是字母数字字符

如果用户输入在您的元组集中，它如何包含非字母数字字符？此外，您不必指定您是否一次测试单个单词或完整短语。

让我们尝试一种不同的方法。首先，不要使用两个级别的数据结构（即只是字典）。其次，我们将元组切换到列表，不是出于技术原因，而是出于语义问题（同类 - >列表，异构 - ＆gt;元组）。我们暂时抛弃正则表达式，支持简单的split()和in测试。最后，我们将测试完整的短语：

vocabulary = {
    'direction': ['north', 'south', 'east', 'west', 'down', 'up', 'left', 'right', 'back'],
    'verb': ['go', 'stop', 'kill', 'eat'],
    'stop': ['the', 'in', 'of', 'from', 'at', 'it'],
    'noun': ['door', 'bear', 'princess', 'cabinet']
    }

vocabulary_list = [word for sublist in vocabulary.values() for word in sublist]

phrases = ["Go in the east door", "Stop at the cabinet", "Eat the bear", "Do my taxes"]

# use this to pick out error strings from user input
for phrase in phrases:
    if any(term.lower() not in vocabulary_list for term in phrase.split()):
        print phrase, "-> invalid"
    else:
        print phrase, "-> valid"

<强>产生

Go in the east door -> valid
Stop at the cabinet -> valid
Eat the bear -> valid
Do my taxes -> invalid

从这里开始，你可以考虑允许一些像逗号和句号这样的标点符号，然后简单地删除它们而不是判断它们。

Answer 2

这不是一个使用正则表达式的好地方，这与有效的Python正则表达式完全不同。

最好只是在循环中检查用户输入（可能强制为小写）是否等于任何命令。

在元组上使用python regex来过滤用户输入

2 个答案: