我有以下两个python列表。
prob_tokens = ['119', '120', '123', '1234', '12345']
complete_tokens = ['112', '120', '121', '123', '1233', '1234', '1235', '12345']
min_len_sec_list = 3
max_len_sec_list = 5
我想创建一个字典,将第一个列表中的元素作为键并具有以下约束条件:
False
。False
。例如:
(i)在检查123
时,如果第二列表中存在1234
12345
(123*
),则123
的值为{{1 }}。
(ii)。同样,在检查False
时,如果存在1234
(12345
),则值将为1234*
。
这里False
将是*
[0-9]{(max_len-len_token)}
。输出:
True
final_token_dict
我如何获得实现建议? 在此先感谢!
答案 0 :(得分:7)
您可以将自定义函数与字典理解一起使用:
prob_tokens = ['119', '120', '123', '1234', '12345']
complete_tokens = ['112', '120', '121', '123', '1233', '1234', '1235', '12345']
def mapper(val, ref_list):
if any(x.startswith(val) and (len(x) > len(val)) for x in ref_list):
return False
if val in ref_list:
return True
return False
res = {i: mapper(i, complete_tokens) for i in prob_tokens}
print(res)
{'119': False, '120': True, '123': False, '1234': False, '12345': True}
如果字符数标准对您很重要,则可以使用链式比较和其他输入来相应地调整逻辑:
def mapper(val, ref_list, max_len):
if any(x.startswith(val) and (0 < (len(x) - len(val)) <= max_len) for x in ref_list):
return False
if val in ref_list:
return True
return False
min_len_sec_list = 3
max_len_sec_list = 5
add_lens = max_len_sec_list - min_len_sec_list
res = {i: mapper(i, complete_tokens, add_lens) for i in prob_tokens}
答案 1 :(得分:4)
您可以将列表转换为Trie, or Prefix Tree, structure,然后检查该Trie中是否有任何键是前缀。这将比分别检查列表中每个元素的前缀是否快要快。更具体地说,如果您在k
列表中有prob_tokens
个元素,并且在n
中有complete_tokens
个元素,那么这将仅使O(n + k),同时检查每个对是O(n * k)。 1
def make_trie(lst):
trie = {}
for key in lst:
t = trie
for c in key:
t = t.setdefault(c, {})
return trie
def check_trie(trie, key):
for c in key:
trie = trie.get(c, None)
if trie is None: return False # not in trie
if trie == {}: return True # leaf in trie
return False # in trie, but not a leaf
prob_tokens = ['119', '120', '123', '1234', '12345']
complete_tokens = ['112', '120', '121', '123', '1233', '1234', '1235', '12345']
trie = make_trie(complete_tokens)
# {'1': {'1': {'2': {}}, '2': {'0': {}, '1': {}, '3': {'3': {}, '4': {'5': {}}, '5': {}}}}}
res = {key: check_trie(trie, key) for key in prob_tokens}
# {'119': False, '120': True, '123': False, '1234': False, '12345': True}
1)实际上,密钥的平均长度也是一个因素,但是在两种方法中都是如此。
答案 2 :(得分:3)
您可以使用any
:
a = ['119', '120', '123', '1234', '12345']
b = ['112', '120', '121', '123', '1233', '1234', '1235', '12345']
new_d = {c:c in b and not any(i.startswith(c) and len(c) < len(i) for i in b) for c in a}
输出:
{'120': True, '1234': False, '119': False, '123': False, '12345': True}
答案 3 :(得分:2)
这可能是另一种选择
import re
prob_tokens = ['119', '120', '123', '1234', '12345']
complete_tokens = ['112', '120', '121', '123', '1233', '1234', '1235', '12345']
dictionary = dict()
for tok in prob_tokens:
if tok not in complete_tokens or any([bool(re.compile(r'^%s\d+'%tok).search(tok2)) for tok2 in complete_tokens]):
dictionary[tok] = False
else:
dictionary[tok] = True
print(dictionary)`
答案 4 :(得分:2)
我想您也可以尝试这样的事情:
from collections import Counter
prob_tokens = ['119', '120', '123', '1234', '12345']
complete_tokens = ['112', '120', '121', '123', '1233', '1234', '1235', '12345']
result = {}
for token in prob_tokens:
token_len = len(token)
# Create counts of prefix lengths
counts = Counter(c[:token_len] for c in complete_tokens)
# Set True if only one prefix found, else False
result[token] = counts[token] == 1
print(result)
哪些输出:
{'119': False, '120': True, '123': False, '1234': False, '12345': True}
答案 5 :(得分:2)
如果True
中以指定键开头的元素总数为1,则正常的dict理解值为complete_tokens
即可完成工作
prob_tokens = ['119', '120', '123', '1234', '12345']
complete_tokens = ['112', '120', '121', '123', '1233', '1234', '1235', '12345']
res = {elem:sum(v.startswith(elem) for v in complete_tokens)==1 for elem in prob_tokens}
print (res)
输出
{'119': False, '120': True, '123': False, '1234': False, '12345': True}
为了获得更高的效率,您可以将complete_tokens
转换为一个集合,然后使用any
而不是检查每个元素
complete_tokens_set = set(complete_tokens)
res = {elem:elem in complete_tokens_set and not any(v!=elem and v.startswith(elem) for v in complete_tokens_set) for elem in prob_tokens}