Question

目前我正在使用下面的过滤器来增加arr中的元素，给定一个字符串列表作为参数，有一种有效的方法在python中执行此操作。我有数百万个此类列表需要验证。

name

Answer 1

对于问题中给出的正则表达式，您可以使用以下使用字符类的正则表达式：

[admut]-

[admut]将匹配a，d，m，u，t
^可以省略，因为re.match仅匹配字符串的开头。
删除-*，因为它没有意义;只有一个-足以检查-后显示的a/d/m/u/t。

而不是使用数组，你可以使用字典;无需记住索引：

def countbycat(tempfilter):
    count = dict.fromkeys('admut', 0)
    pattern = re.compile("[admut]-")
    for each in tempfilter:
        if pattern.match(each):
            count[each[0]] += 1
    return count

您可以使用dict.fromkeys。

代替collections.Counter

Answer 2

不要使用正则表达式。您正在检查非常具体的固定条件。即，each[0] in 'admut'和def countbycat(tempfilter): arr = [0, 0, 0, 0, 0] char_idx = { # map admit to indices 'u': 0, 'm': 1, 'd': 2, 'a': 3, 't': 4, } for each in tempfilter: if each[1] == '-': # detect trailing - try: arr[char_idx[each[0]]] += 1 # increment position pointed to by admut except KeyError: # each[0] not any of admut pass return arr。这两个都比正则表达式快很多。后者也可以用作映射。

char

Answer 3

在您的简单案例中，请转到falsetru's answer

一般情况下，您可以将模式合并为一个正则表达式（前提是您的正则表达式不包含捕获组），并检查正则表达式匹配的匹配项：

patterns = ["^[a]-+", "^[d]-+", "^[m]-+", "^[u]-+", "^[t]-+"]

complex_pattern = re.compile('|'.join(['(%s)' % i for i in patterns]))

# imperative way

arr = [0, 0, 0, 0, 0]

for each in tempfilter:
    match = complex_pattern.match(each)
    if match:
        arr[match.lastgroup + 1] += 1

return arr

# functional way

from collections import Counter

matches_or_none = (complex_pattern.match(each) for each in tempfilter)

return Counter(match.lastgroup + 1 for match in matches_or_none if match is not None)

如何将字符串与多个正则表达式匹配？

3 个答案: