我有清单:
print (L)
[('bar', 'one'), ('bar', 'two'), ('baz', 'one'),
('baz', 'two'), ('foo', 'one'), ('qux', 'one'),
('qux', 'two'), ('oof', 'two'), ('oof', 'one'), ('oof', 'three')]
我希望按元组中的第一个元素进行分组,并过滤包含one
和two
作为第二个元素的所有元组。
因此,需要过滤掉('oof', 'two')
和('foo', 'one')
,因为foo
只有一个元素,oof
只有3个元素。
预期输出 - 每个第一个元素bar
,baz
秒为one
和two
,长度为2:
print(L1)
[('bar', 'one'), ('bar', 'two'),
('baz', 'one'), ('baz', 'two'),
('qux', 'one'), ('qux', 'two')]
我试试:
L = [b in ['one','two'] for a,b in L]
print (L)
[True, True, True, True, True, True, True, True]
它的优点/ pythonic解决方案是什么?
答案 0 :(得分:5)
以下是使用groupby
的解决方案:
import itertools, operator
# group the tuples by the first element
result = itertools.groupby(sorted(L), key=operator.itemgetter(0))
# convert the groups to lists
result = [list(group) for _, group in result]
# filter out those lists that don't contain exactly "one" and "two"
result = [group for group in result if set(y for x, y in group) == {'one', 'two'}]
# flatten the nested list into a list of tuples
result = [x for group in result for x in group]
print(result)
请注意,这并不关心重复的元组:
L = [('bar', 'one'), ('bar', 'two'), ('bar', 'two')]
# result = [('bar', 'one'), ('bar', 'two'), ('bar', 'two')]
如果你在输出中不想要这些,你可以像这样重写过滤条件(第二列表理解):
result = [group for group in result if
set(y for x, y in group) == {'one', 'two'} and len(group) == 2]
答案 1 :(得分:3)
你可以通过pandas groupby ie
完成这项工作L = [('bar', 'one'), ('bar', 'two'), ('baz', 'one'),
('baz', 'two'), ('foo', 'one'), ('qux', 'one'),
('qux', 'two'), ('oof', 'two'), ('oof', 'one'), ('oof', 'three'),
('new','five'),('new','six')]
df = pd.DataFrame(L)
s = df.groupby(0).size()
temp = s[s==2].index
idx = df[df[0].isin(temp)].groupby(0)[1].apply(lambda x : all(x.isin(['one','two'])))
df[df[0].isin(idx[idx].index)].apply(tuple,1).tolist()
[('bar', 'one'),
('bar', 'two'),
('baz', 'one'),
('baz', 'two'),
('qux', 'one'),
('qux', 'two')]
答案 2 :(得分:1)
这一行怎么样?
data=[('bar', 'one'), ('bar', 'two'), ('baz', 'one'),
('baz', 'two'), ('foo', 'one'), ('qux', 'one'),
('qux', 'two'), ('oof', 'two'), ('oof', 'two')]
from itertools import groupby
print(list(filter(lambda x:len(x)==2 and sorted((x[1][1],x[0][1]))==['one','two'],[list(b) for a,b in groupby(data,key=lambda x:x[0])])))
输出:
[[('bar', 'one'), ('bar', 'two')], [('baz', 'one'), ('baz', 'two')], [('qux', 'one'), ('qux', 'two')]]
详细:
data=[('bar', 'one'), ('bar', 'two'), ('baz', 'one'),
('baz', 'two'), ('foo', 'one'), ('qux', 'one'),
('qux', 'two'), ('oof', 'two'), ('oof', 'one'), ('oof', 'three')]
dublicates={}
for i in data:
if i[0] not in dublicates:
dublicates[i[0]]=[i[1]]
else:
dublicates[i[0]].append(i[1])
print(dublicates)
final=[]
for j,i in dublicates.items():
if len(i)==2:
if 'one' and 'two' in i:
final.extend([(j,'one'),(j, 'two')])
print(final)
输出:
[('baz', 'one'), ('baz', 'two'), ('qux', 'one'), ('qux', 'two'), ('bar', 'one'), ('bar', 'two')]
答案 3 :(得分:0)
内置sorted
会自动执行此操作。在对元组列表进行排序时,它将按第一项排序,然后按第二项排序,依此类推。
from pprint import pprint
def is_interesting(element):
a, b = element
return b in ('one', 'two')
result = sorted(filter(is_interesting, some_list))
pprint(result)
输出为
[('bar', 'one'),
('bar', 'two'),
('baz', 'one'),
('baz', 'two'),
('foo', 'one'),
('oof', 'two'),
('qux', 'one'),
('qux', 'two')]