我有一个看起来像这样的pandas数据透视表:
C bar foo
A B
one A -1.154627 -0.243234
three A -1.327977 0.243234
B 1.327977 -0.079051
C -0.832506 1.327977
two A 1.327977 -0.128534
B 0.835120 1.327977
C 1.327977 0.838040
我希望能够过滤掉列B中列A少于2行的行,以便上面的表格会过滤A = 1:
C bar foo
A B
three A -1.327977 0.243234
B 1.327977 -0.079051
C -0.832506 1.327977
two A 1.327977 -0.128534
B 0.835120 1.327977
C 1.327977 0.838040
我该怎么做?
答案 0 :(得分:7)
在一行中:
In [64]: df[df.groupby(level=0).bar.transform(lambda x: len(x) >= 2).astype('bool')]
Out[64]:
bar foo
two A 0.944908 0.701687
B -0.204075 0.713141
C 0.730844 -0.022302
three A 1.263489 -1.382653
B 0.124444 0.907667
C -2.407691 -0.773040
在即将发布的pandas(11.1)中,新的filter
method可以更快,更直观地实现这一目标:
In [65]: df.groupby(level=0).filter(lambda x: len(x['bar']) >= 2)
Out[65]:
bar foo
three A 1.263489 -1.382653
B 0.124444 0.907667
C -2.407691 -0.773040
two A 0.944908 0.701687
B -0.204075 0.713141
C 0.730844 -0.022302
答案 1 :(得分:2)
一种方法是将'A'分组,然后查看大小为3的那些组:
In [11]: g = df1.groupby(level='A')
In [12]: g.size()
Out[12]:
A
one 1
three 3
two 3
dtype: int64
In [13]: size = g.size()
In [13]: big_size = size[size>=3]
In [14]: big_size
Out[14]:
A
three 3
two 3
dtype: int64
然后你可以看到哪些行有“好”的'A'值,并按以下方式切片:
In [15]: good_A = df1.index.get_level_values('A').isin(big_size.index)
In [16]: good_A
Out[16]: array([False, True, True, True, True, True, True], dtype=bool)
In [17]: df1[good_A]
Out[17]:
bar foo
A B
three A -1.327977 0.243234
B 1.327977 -0.079051
C -0.832506 1.327977
two A 1.327977 -0.128534
B 0.835120 1.327977
C 1.327977 0.838040