我有一个很大的字符串列表,例如:
full_log = ['AB21','BG54','HG89','NS72','Error','CF54','SD62','KK02','FE34']
和多个小字符串列表,例如:
tc1 = ['HG89','NS72']
tc2 = ['AB21','BG54']
tc3 = ['KK02','FE34']
tc4 = ['CF54','SD62']
我想在较大的列表中找到每个较小的列表来维护序列,以便输出类似于:
tc2-tc1-Er-tc4-tc3
我想知道是否有任何直接的,pythonic方式来处理这种情况。
答案 0 :(得分:3)
您需要创建小列表元素的地图(字典):
m = {k: v for k, v in zip(map(tuple, [tc1, tc2, tc3, tc4])), ["tc1", "tc2", "tc3", "tc4"])}
>>> {('KK02', 'FE34'): 'tc3', ('AB21', 'BG54'): 'tc2', ('CF54', 'SD62'): 'tc4', ('HG89', 'NS72'): 'tc1'}
然后,您可以使用迭代器遍历列表:
itr = iter(full_log)
for i in itr:
if i != "Error":
n = next(itr)
if n != "Error":
if (i, n) in m:
print m[(i, n)]
else:
print "Er"
else:
print "Er"
>>> tc2
tc1
Er
tc4
tc3
如果您不介意扩展您的错误"第一个清单中的条目:
full_log2 = [item for sublist in [[i] if i != "Error" else ["Error", "Error"] for i in full_log] for item in sublist]
>>> ['AB21', 'BG54', 'HG89', 'NS72', 'Error', 'Error', 'CF54', 'SD62', 'KK02', 'FE34']
然后你可以使用列表理解:
print [m[(full_log2[i], full_log2[i+1])] if (full_log2[i], full_log2[i+1]) in m else "Er" for i in range(0, len(full_log2)-1, 2)]
>>> ['tc2', 'tc1', 'Er', 'tc4', 'tc3']
答案 1 :(得分:2)
如果您的所有短列表长度相等,您只需创建dict
,其中键为tuple
字符串,值为其中一个标签。您可以浏览full_log
,找一个合适长度的块,看看是否可以从dict
找到。
如果短列表的长度不同,则上述方法不起作用,因为从full_log
获取的块长度不是恒定的。在这种情况下,一种可能的解决方案是将项目从短列表添加到树结构中,其中叶节点是标签。然后,对于full_log
中的每个索引,查看是否可以从树中找到路径。如果找到路径,则向前跳转它,否则从下一个索引尝试:
from collections import defaultdict
from itertools import islice
full_log = ['AB21','BG54','HG89','NS72','Error','CF54','SD62','KK02','FE34']
# Construct a tree
dd = lambda: defaultdict(dd)
labels = defaultdict(dd)
labels['HG89']['NS72'] = 'tc1'
labels['AB21']['BG54'] = 'tc2'
labels['KK02']['FE34'] = 'tc3'
labels['CF54']['SD62'] = 'tc4'
# Find label, return tuple (label, length) or (None, 1)
def find_label(it):
length = 0
node = labels
while node and isinstance(node, dict):
node = node.get(next(it, None))
length += 1
return node, (length if node else 1)
i = 0
result = []
while i < len(full_log):
label, length = find_label(islice(full_log, i, None))
result.append(label if label else full_log[i])
i += length
print result # ['tc2', 'tc1', 'Error', 'tc4', 'tc3']
上面使用的树有点trie,但节点可以包含子节点或值(标签)。
答案 2 :(得分:0)
您可以使用Set进行模式匹配:
from sets import Set
full_log = ['AB21','BG54','HG89','NS72','Error','CF54','SD62','KK02','FE34']
tc1 = ['HG89','NS72']
tc2 = ['AB21','BG54']
tc3 = ['KK02','FE34']
tc4 = ['CF54','SD62']
set(full_log) & set(tc1)
输出:{'HG89', 'NS72'}
#Finding index of set elements:
result=set(full_log) & set(tc1)
def all_indices(value, qlist):
indices = []
idx = -1
while True:
try:
idx = qlist.index(value, idx+1)
indices.append(idx)
except ValueError:
break
return indices
r=[]
for i in range(len(result)):
s=all_indices(list(result)[i], full_log)
r.append(s)
r
Output: [[2], [3]]