Question

我有以下两个python列表。

prob_tokens = ['119', '120', '123', '1234', '12345']

complete_tokens = ['112', '120', '121', '123', '1233', '1234', '1235', '12345']

min_len_sec_list = 3
max_len_sec_list = 5

我想创建一个字典，将第一个列表中的元素作为键并具有以下约束条件：

如果密钥不在第二个列表中，则值将为False。
如果密钥存在于带有变体的第二个列表中，则值将为False。

例如：

（i）在检查123时，如果第二列表中存在1234 12345（123*），则123的值为{{1 }}。

（ii）。同样，在检查False时，如果存在1234（12345），则值将为1234*。

这里False将是*

如果键存在于第二个列表中且没有任何变体，则值将为[0-9]{(max_len-len_token)}。

输出：

True

final_token_dict

我如何获得实现建议？在此先感谢！

Answer 1

您可以将自定义函数与字典理解一起使用：

prob_tokens = ['119', '120', '123', '1234', '12345']
complete_tokens = ['112', '120', '121', '123', '1233', '1234', '1235', '12345']

def mapper(val, ref_list):
    if any(x.startswith(val) and (len(x) > len(val)) for x in ref_list):
        return False
    if val in ref_list:
        return True
    return False

res = {i: mapper(i, complete_tokens) for i in prob_tokens}

print(res)

{'119': False, '120': True, '123': False, '1234': False, '12345': True}

如果字符数标准对您很重要，则可以使用链式比较和其他输入来相应地调整逻辑：

def mapper(val, ref_list, max_len):
    if any(x.startswith(val) and (0 < (len(x) - len(val)) <= max_len) for x in ref_list):
        return False
    if val in ref_list:
        return True
    return False

min_len_sec_list = 3
max_len_sec_list = 5

add_lens = max_len_sec_list - min_len_sec_list

res = {i: mapper(i, complete_tokens, add_lens) for i in prob_tokens}

Answer 2

您可以将列表转换为Trie, or Prefix Tree, structure，然后检查该Trie中是否有任何键是前缀。这将比分别检查列表中每个元素的前缀是否快要快。更具体地说，如果您在k列表中有prob_tokens个元素，并且在n中有complete_tokens个元素，那么这将仅使O（n + k），同时检查每个对是O（n * k）。¹

def make_trie(lst):
    trie = {}
    for key in lst:
        t = trie
        for c in key:
            t = t.setdefault(c, {})
    return trie

def check_trie(trie, key):
    for c in key:
        trie = trie.get(c, None)
        if trie is None: return False # not in trie
        if trie == {}: return True    # leaf in trie
    return False  # in trie, but not a leaf

prob_tokens = ['119', '120', '123', '1234', '12345']
complete_tokens = ['112', '120', '121', '123', '1233', '1234', '1235', '12345']

trie = make_trie(complete_tokens)
# {'1': {'1': {'2': {}}, '2': {'0': {}, '1': {}, '3': {'3': {}, '4': {'5': {}}, '5': {}}}}}
res = {key: check_trie(trie, key) for key in prob_tokens}
# {'119': False, '120': True, '123': False, '1234': False, '12345': True}

^1）实际上，密钥的平均长度也是一个因素，但是在两种方法中都是如此。

Answer 3

您可以使用any：

a = ['119', '120', '123', '1234', '12345']
b = ['112', '120', '121', '123', '1233', '1234', '1235', '12345']
new_d = {c:c in b and not any(i.startswith(c) and len(c) < len(i) for i in b) for c in a}

输出：

{'120': True, '1234': False, '119': False, '123': False, '12345': True}

Answer 4

这可能是另一种选择

import re

prob_tokens = ['119', '120', '123', '1234', '12345']

complete_tokens = ['112', '120', '121', '123', '1233', '1234', '1235', '12345']

dictionary = dict()
for tok in prob_tokens:
    if tok not in complete_tokens or any([bool(re.compile(r'^%s\d+'%tok).search(tok2)) for tok2 in complete_tokens]):
        dictionary[tok] = False
    else:
        dictionary[tok] = True

print(dictionary)`

Answer 5

我想您也可以尝试这样的事情：

from collections import Counter

prob_tokens = ['119', '120', '123', '1234', '12345']

complete_tokens = ['112', '120', '121', '123', '1233', '1234', '1235', '12345']

result = {}
for token in prob_tokens:
    token_len = len(token)

    # Create counts of prefix lengths
    counts = Counter(c[:token_len] for c in complete_tokens)

    # Set True if only one prefix found, else False
    result[token] = counts[token] == 1

print(result)

哪些输出：

{'119': False, '120': True, '123': False, '1234': False, '12345': True}

Answer 6

如果True中以指定键开头的元素总数为1，则正常的dict理解值为complete_tokens即可完成工作

prob_tokens = ['119', '120', '123', '1234', '12345']
complete_tokens = ['112', '120', '121', '123', '1233', '1234', '1235', '12345']

res = {elem:sum(v.startswith(elem) for v in complete_tokens)==1 for elem in prob_tokens}
print (res)

输出

{'119': False, '120': True, '123': False, '1234': False, '12345': True}

为了获得更高的效率，您可以将complete_tokens转换为一个集合，然后使用any而不是检查每个元素

complete_tokens_set = set(complete_tokens)
res = {elem:elem in complete_tokens_set and not any(v!=elem and v.startswith(elem) for v in complete_tokens_set) for elem in prob_tokens}

从两个列表创建自定义词典

6 个答案: