所以我有一个像这样的元组列表:
[
('Worksheet',),
('1a', 'Calculated'),
('None', 'None', 'None', 'None', 'None'),
('1b', 'General'),
('1b', 'General', 'Basic'),
('1b', 'General', 'Basic', 'Data'),
('1b', 'General', 'Basic', 'Data', 'Line 1'),
('1b', 'General', 'Basic', 'Data', 'Line 2'),
('None', 'None', 'None', 'None', 'None'),
('1c', 'General'),
('1c', 'General', 'Basic'),
('1c', 'General', 'Basic', 'Data'),
('None', 'None', 'None', 'None', 'None'),
('2', 'Active'),
('2', 'Active', 'Passive'),
('None', 'None', 'None', 'None', 'None'),
...
]
每个元组的长度为1-5。我需要以递归方式减少列表以结束:
[
('Worksheet',),
('1a', 'Calculated'),
('None', 'None', 'None', 'None', 'None'),
('1b', 'General', 'Basic', 'Data', 'Line 1'),
('1b', 'General', 'Basic', 'Data', 'Line 2'),
('None', 'None', 'None', 'None', 'None'),
('1c', 'General', 'Basic', 'Data'),
('None', 'None', 'None', 'None', 'None'),
('2', 'Active', 'Passive'),
('None', 'None', 'None', 'None', 'None'),
...
]
基本上,如果下一行与上一行的所有行匹配,则将其移除到具有相同层次结构的元组的最大长度。
因此,在我的示例中看到有3行,其中1c
是元组中的第一项,因此它被缩短为最长。
答案 0 :(得分:1)
将元组分组到第一个元素上;使用itertools.groupby()
(使用operator.itemgetter()
可以轻松创建密钥。
然后分别过滤每个组:
from itertools import groupby, chain
from operator import itemgetter
def filtered_group(group):
group = list(group)
maxlen = max(len(l) for l in group)
return [l for l in group if len(l) == maxlen]
filtered = [filtered_group(g) for k, g in groupby(inputlist, key=itemgetter(0))]
output = list(chain.from_iterable(filtered))
演示:
>>> from itertools import groupby, chain
>>> from operator import itemgetter
>>> from pprint import pprint
>>> def filtered_group(group):
... group = list(group)
... maxlen = max(len(l) for l in group)
... return [l for l in group if len(l) == maxlen]
...
>>> filtered = [filtered_group(g) for k, g in groupby(inputlist, key=itemgetter(0))]
>>> pprint(list(chain.from_iterable(filtered)))
[('Worksheet',),
('1a', 'Calculated'),
('None', 'None', 'None', 'None', 'None'),
('1b', 'General', 'Basic', 'Data', 'Line 1'),
('1b', 'General', 'Basic', 'Data', 'Line 2'),
('None', 'None', 'None', 'None', 'None'),
('1c', 'General', 'Basic', 'Data'),
('None', 'None', 'None', 'None', 'None'),
('2', 'Active', 'Passive'),
('None', 'None', 'None', 'None', 'None')]
答案 1 :(得分:1)
from pprint import pprint
l=[
('Worksheet',),
('1a', 'Calculated'),
('None', 'None', 'None', 'None', 'None'),
('1b', 'General'),
('1b', 'General', 'Basic'),
('1b', 'General', 'Basic', 'Data'),
('1b', 'General', 'Basic', 'Data', 'Line 1'),
('1b', 'General', 'Basic', 'Data', 'Line 2'),
('None', 'None', 'None', 'None', 'None'),
('1c', 'General'),
('1c', 'General', 'Basic'),
('1c', 'General', 'Basic', 'Data'),
('None', 'None', 'None', 'None', 'None'),
('2', 'Active'),
('2', 'Active', 'Passive'),
('None', 'None', 'None', 'None', 'None')
#...
]
i=0
while i<len(l)-1:
l0=l[i]
l1=l[i+1]
if len(l1)==len(l0)+1 and l1[:-1]==l0:
del l[i]
else:
i+=1
pprint(l)
逻辑:比较下一行(除了最后一行)。如果下一个与另外一个项目相同,请删除第一个。否则,前进到下一对线。
这不是一个递归的解决方案,但可以重新编写一个。这是一个过滤操作,您需要条件中的下一个项目。
只是为了好玩,这里是一个递归的Haskell版本(这种类型的递归在Haskell和Scheme中很有效,但不是Python):
prefixfilt :: Eq a => [[a]] -> [[a]]
prefixfilt [] = []
prefixfilt [x] = [x]
prefixfilt (x0:x1:xs) =
if x0 == init x1 then rest else (x0:rest)
where rest = prefixfilt (x1:xs)
答案 2 :(得分:1)
def is_subtuple(tup1, tup2):
'''Return True if all the elements of tup1 are consecutively in tup2.'''
if len(tup2) < len(tup1): return False
try:
offset = tup2.index(tup1[0])
except ValueError:
return False
# This could be wrong if tup1[0] is in tup2, but doesn't start the subtuple.
# You could solve this by recurring on the rest of tup2 if this is false, but
# it doesn't apply to your input data.
return tup1 == tup2[offset:offset+len(tup1)]
然后,只需过滤输入列表(此处名为l
):
[t for i, t in enumerate(l) if not any(is_subtuple(t, t2) for t2 in l[i+1:])]
现在,这个列表理解假定输入列表按照你显示它的方式一致排序,子数比它们所在的元组早。它也有点贵(O(n**2)
,我认为) ,但它会完成工作。