在一个df上创建具有条件的lambda函数,以在另一个df的df.apply中使用

时间:2016-10-06 12:06:26

标签: python pandas lambda

考虑df

Index   A         B      C
0      20161001   0      24.5
1      20161001   3      26.5
2      20161001   6      21.5
3      20161001   9      29.5
4      20161001   12     20.5
5      20161002   0      30.5
6      20161002   3      22.5
7      20161002   6      25.5
...

另请考虑df2

Index Threshold
0     25
1     27
2     29
3     30
4     25
5     30
..

我想向"Number of Rows"添加一列df2,其中包含df(C > Threshold) & (A >= 20161001) & (A <= 20161002)成立的行数。这基本上意味着df

中有多个列存在条件
Index Threshold  Number of Rows 
0     25         4
1     27         2
2     29         2
3     30         1
4     25         4
5     30         1
..

对于Threshold=25中的df2df中有4行"C"值超过25。

我尝试过类似的事情:

def foo(threshold,start,end):
    return len(df[(df['C'] > threshold) & (df['A'] > start) & (df['A'] < end)])

df2['Number of rows'] = df.apply(lambda df2: foo(df2['Threshold'],start = 20161001, end = 20161002),axis=1)

但这会将Number of Rows列填充为0.为什么会这样?

1 个答案:

答案 0 :(得分:2)

您可以使用布尔索引和sum()聚合函数

# Create the first dataframe (df)
df = pd.DataFrame([[20161001,0 ,24.5],
                   [20161001,3 ,26.5],
                   [20161001,6 ,21.5],
                   [20161001,9 ,29.5],
                   [20161001,12,20.5],
                   [20161002,0 ,30.5],
                   [20161002,3 ,22.5],
                   [20161002,6 ,25.5]],columns=['A','B','C'])

# Create the second dataframe (df2)

df2 = pd.DataFrame(data=[25,27,29,30,25,30],columns=['Threshold'])

start = 20161001
end = 20161002

df2['Number of Rows'] = df2['Threshold'].apply(lambda x : ((df.C > x) & (df.A >= start) & (df.A <= end)).sum())

print(df2['Number of Rows'])

Out[]: 
       0    4
       1    2
       2    2
       3    1
       4    4
       5    1
       Name: Number of Rows, dtype: int64