检查1个列表中的部分匹配与部分匹配另一个列表 - 可能与列表理解?

时间:2014-02-16 23:06:26

标签: python regex python-2.7 match list-comprehension

这里有一个python /编程新手。

我编写的代码可以满足我的需求:

import re
syns = ['professionals|experts|specialists|pros', 'repayed|payed back', 'ridiculous|absurd|preposterous', 'salient|prominent|significant' ]
new_syns = ['repayed|payed back', 'ridiculous|crazy|stupid', 'salient|prominent|significant', 'winter-time|winter|winter season', 'professionals|pros']

def pipe1(syn):
    # Find first word/phrase in list element up to and including the 1st pipe
    r = r'.*?\|'
    m = re.match(r, syn)
    m = m.group()
    return m

def find_non_match():
    # Compare 'new_syns' with 'syns' and create new list from non-matches in 'new_syns'
    p = '@#&'   # Place holder created
    joined = p.join(syns)
    joined = p + joined   # Adds place holder to beginning of string too
    non_match = []
    for syn in new_syns:
        m = pipe1(syn)
        m = p + m
        if m not in joined:
            non_match.append(syn)
    return non_match

print find_non_match()

印刷输出:

['winter-time|winter|winter season']

代码检查new_syns中每个元素的单词/短语是否与syns列表中的相同部分匹配相匹配。代码的目的是实际找到不匹配,然后将它们附加到一个名为non_match的新列表中。

然而,我很想知道是否有可能实现相同的目的,但使用列表理解的行数要少得多。我试过了,但我没有得到我想要的东西。这是我到目前为止所提出的:

import re
syns = ['professionals|experts|specialists|pros', 'repayed|payed back', 'ridiculous|absurd|preposterous', 'salient|prominent|significant' ]
new_syns = ['repayed|payed back', 'ridiculous|crazy|stupid', 'salient|prominent|significant', 'winter-time|winter|winter season', 'professionals|pros']

def pipe1(syn):
    # Find first word/phrase in list element up to and including the 1st pipe
    r = r'.*?\|'
    m = re.match(r, syn)
    m = '@#&' + m.group() # Add unusual symbol combo to creatte match for beginning of element
    return m

non_match = [i for i in new_syns if pipe1(i) not in '@#&'.join(syns)]
print non_match

印刷输出:

['winter-time|winter|winter season', 'professionals|pros'] # I don't want 'professionals|pros' in the list

列表理解中的警告是,当syns加入@#&时,我在现在加入的字符串的开头没有@#&,而在原始代码中上面不使用列表推导我将@#&添加到连接字符串的开头。结果是'professionals|pros'已经通过网络滑落。但我不知道如何在列表理解中解决这个问题。

所以我的问题是“这可能与列表理解有关吗?”。

1 个答案:

答案 0 :(得分:1)

我想你想要这样的东西:

non_match = [i for i in new_syns if not any(any(w == s.split("|")[0] 
                                                for w in i.split("|")) 
                                            for s in syns)]

这不使用正则表达式,但会给出结果

non_match == ['winter-time|winter|winter season']

该列表包含来自new_syns的所有项目,not any - '|'中没有w个单词any位于第一个单词的split("|")[0]中来自s

的每个同义词群组syns {{1}}}