我正在尝试删除分组数据框的每个块中的重复行。玩具的例子是
import pandas as pd
import numpy as np
arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], \
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.ones([8,2]), index=index)
print(df)
输出结果为:
0 1
first second
bar one 1 1
two 1 1
baz one 1 1
two 1 1
foo one 1 1
two 1 1
qux one 1 1
two 1 1
但是,如果我尝试
print(df.groupby(level='first').apply(lambda d: d.drop_duplicates()))
然后我得到
0 1
first first second
bar bar one 1 1
baz baz one 1 1
foo foo one 1 1
qux qux one 1 1
如果没有额外的“第一个”索引,有没有办法做我需要的东西?
答案 0 :(得分:0)
将group_keys=False
传递给groupby
:
In [273]:
df.groupby(level='first', group_keys=False).apply(lambda d: d.drop_duplicates())
Out[273]:
0 1
first second
bar one 1 1
baz one 1 1
foo one 1 1
qux one 1 1