我有一个列表,其结构如下:
arr = [ ['a'],
['a','b'],
['a','x','y'],
['a','c'],
['a','c','a'],
['a','c','b'],
['a','c','b','a'],
['a','c','b','b'],
['a','d'],
['b'],
['b','c'],
['b','c','a'],
['b','c','b'],
['c','d'],
['c','d','e'],
['c','d','f'],
['c','d','f','a'],
['c','d','f','b'],
['c','d','f','b','a'],
]
正如您将观察到列表具有一些独特元素,然后以下元素构建在唯一元素上,直到出现新的唯一元素。这些应该属于类别和子类别。所以[a],[b],['c','d']是广义的主要类别,然后在基于与上述相同的原则的子类别中还有其他子类别。理想情况下,我希望将类别和子类别作为字典。最终结果应该类似于:
{'a': ['a-b',
'a-x-y',
{'a-c':
['a-c-a',
{'a-c-b':
['a-c-b-a',
'a-c-b-b']
}]
}
],
'b' : ................
'c-d': ...............}
我也可能只使用第一级子分类并完全丢弃其余部分。在这种情况下,输出将是:
{'a': ['a-b', 'a-x-y', 'a-c', 'a-d'], 'b': ['b-c'], 'c-d': ['c-d-e', 'c-d-f']}
我已为第二个场景编写了代码,但我不确定这是否是解决此问题的有效方法:
def arrange(arr):
cat = {"-".join(arr[0]): ["-".join(arr[1])]}
main = 0
for i in range(2,len(arr)):
l = len(arr[main])
if arr[main] == arr[i][0:l]:
cat["-".join(arr[main])].append("-".join(arr[i]))
else:
cat["-".join(arr[i])] = []
main = i
for k,v in cat.items():
found = True
i = 0
while i < len(v)-1:
f_idx = i + 1
while v[i] in v[f_idx]:
v.pop(f_idx)
i += 1
return cat
输出 - :
{'a': ['a-b', 'a-x-y', 'a-c', 'a-d'], 'b': ['b-c'], 'c-d': ['c-d-e', 'c-d-f']}
请帮助我更好地编写这段代码,或者帮助我使用具有完整结构的字典,其中包含所有子分类。感谢
答案 0 :(得分:0)
最后,我相信我所描述的是第一级子分类并完全丢弃其余部分。
诀窍是根据列表中的项目(键)何时不是后续项目(值)的子列表来创建操作。
使用相同的逻辑删除重复项。
from collections import defaultdict
#Function that compares two lists even with duplicate items
def contains_sublist(lst, sublst):
n = len(sublst)
return any((sublst == lst[i:i+n]) for i in xrange(len(lst)-n+1))
#Define default dict of list
aDict = defaultdict(list)
it = iter(arr)
#Format key
key = '-'.join(next(it))
s = list(key)
# Loop that collects keys if key is not sublist else values
for l in it:
if contains_sublist(l, s):
aDict[key].append(l)
else:
key = '-'.join(l)
s = l
#Loop to remove duplicate items based upon recurrance of sublist
it = iter(aDict.keys())
for k in it:
dellist = []
for s in aDict[k]:
for l in aDict[k]:
if l != s:
if contains_sublist(l, s):
if not l in dellist:
dellist.append(l)
for l in dellist:
try:
aDict[k].remove(l)
except ValueError:
pass
#Create final dict by concatenating list of list with '-'
finaldict = {k:[ '-'.join(i) for i in v ] for k,v in aDict.iteritems()}
结果:
Python 2.7.9 (default, Dec 10 2014, 12:24:55) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> ================================ RESTART ================================
>>>
>>> finaldict
{'a': ['a-b', 'a-x-y', 'a-c', 'a-d'], 'b': ['b-c'], 'c-d': ['c-d-e', 'c-d-f']}
>>>
答案 1 :(得分:0)
您正在描述Trie
。
这是一个非常基本的实现:
def make_trie(words):
root = dict()
for word in words:
current_dict = root
for letter in word:
current_dict = current_dict.setdefault(letter, {})
current_dict[1] = 1
return root
trie = make_trie(arr)
print(trie)
# {'a': {1: 1, 'c': {'a': {1: 1}, 1: 1, 'b': {'a': {1: 1}, 1: 1, 'b': {1: 1}}}, 'b': {1: 1}, 'd': {1: 1}, 'x': {'y': {1: 1}}}, 'c': {'d': {1: 1, 'e': {1: 1}, 'f': {'a': {1: 1}, 1: 1, 'b': {'a': {1: 1}, 1: 1}}}}, 'b': {1: 1, 'c': {'a': {1: 1}, 1: 1, 'b': {1: 1}}}}
print(trie.get('a',{}).get('x',{}))
# {'y': {1: 1}}
这个trie只是嵌套的dicts,所以很容易迭代['a', 'x']
的所有子项,或者选择所有最大深度为2的dicts。
1
用于叶词:例如,如果您将['a', 'x', 'y']
作为子数组,而不是['a', 'x']
。
Python有更完整的Trie库,例如pygtrie。