You can write your customized filter function this way.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(100, 2), columns=['A', 'B'], index=np.random.choice(['X', 'Y', 'Z'], 100))
Out[257]:
A B
Y -0.6444 0.9515
Y 0.0541 0.1810
X 1.0280 -2.1507
Y 0.5513 -0.6256
X -1.4126 0.8487
Y -0.4272 -0.7669
Z -0.3358 0.8212
Z -0.0328 -1.1885
Y 0.9210 1.7363
Z 1.2619 -2.5311
.. ... ...
Y 0.4495 -0.1995
Y -0.5025 0.8696
Z -0.3178 0.5244
X 1.5752 -0.1915
Z 0.2572 0.1216
X -0.5613 1.7869
Y -0.4322 1.4184
Z 0.2402 0.9258
Z -0.3328 1.7380
X -1.9155 0.0929
[100 rows x 2 columns]
def my_filter(group):
# say A^2 + B^2 > 1
selector = (group.A ** 2 + group.B ** 2) > 1
return group[selector]
df.groupby(level=0).apply(my_filter)
Out[256]:
A B
X X 1.0280 -2.1507
X -1.4126 0.8487
X -0.6299 0.8297
X 0.8790 -0.5672
X -2.1781 1.8232
X 0.4533 -1.1098
X 0.8996 -0.6523
X -2.6023 0.2152
X 1.5641 -1.0823
X -0.4553 1.0037
.. ... ...
Z Z -0.7860 1.3643
Z 0.7350 -1.3309
Z 0.9675 -0.9975
Z -1.0461 -0.8538
Z -0.9659 1.7430
Z -0.9788 0.3100
Z 1.6457 1.7855
Z -2.0771 0.4892
Z 0.0399 -1.6994
Z -0.3328 1.7380
[61 rows x 2 columns]
We've removed 39 rows (from 100 to 61).
从DataFrame中选择您的列,然后应用您的函数(可能是lambda
表达式,具体取决于用法)。
mask = dog[['date1', 'date2']].apply(lambda x: abs(x[0] - x[1]).days < 5, axis=1)
>>> dog[mask]
举例说明:
df = pd.DataFrame({'date1': pd.date_range(start='2015-1-1', periods=10),
'date2': pd.date_range(start='2015-1-1', periods=10)[::-1]})
mask = df[['date1', 'date2']].apply(lambda x: abs(x[0] - x[1]).days < 5, axis=1)
>>> df
date1 date2
0 2015-01-01 2015-01-10
1 2015-01-02 2015-01-09
2 2015-01-03 2015-01-08
3 2015-01-04 2015-01-07
4 2015-01-05 2015-01-06
5 2015-01-06 2015-01-05
6 2015-01-07 2015-01-04
7 2015-01-08 2015-01-03
8 2015-01-09 2015-01-02
9 2015-01-10 2015-01-01
>>> df[mask]
date1 date2
3 2015-01-04 2015-01-07
4 2015-01-05 2015-01-06
5 2015-01-06 2015-01-05
6 2015-01-07 2015-01-04
鉴于新的日期过滤了DataFrame,您可以继续进行分析。