考虑df
Index A B C
0 20161001 0 24.5
1 20161001 3 26.5
2 20161001 6 21.5
3 20161001 9 29.5
4 20161001 12 20.5
5 20161002 0 30.5
6 20161002 3 22.5
7 20161002 6 25.5
...
另请考虑df2
Index Threshold
0 25
1 27
2 29
3 30
4 25
5 30
..
我想向"Number of Rows"
添加一列df2
,其中包含df
中(C > Threshold) & (A >= 20161001) & (A <= 20161002)
成立的行数。这基本上意味着df
Index Threshold Number of Rows
0 25 4
1 27 2
2 29 2
3 30 1
4 25 4
5 30 1
..
对于Threshold=25
中的df2
,df
中有4行"C"
值超过25。
我尝试过类似的事情:
def foo(threshold,start,end):
return len(df[(df['C'] > threshold) & (df['A'] > start) & (df['A'] < end)])
df2['Number of rows'] = df.apply(lambda df2: foo(df2['Threshold'],start = 20161001, end = 20161002),axis=1)
但这会将Number of Rows
列填充为0.为什么会这样?
答案 0 :(得分:2)
您可以使用布尔索引和sum()
聚合函数
# Create the first dataframe (df)
df = pd.DataFrame([[20161001,0 ,24.5],
[20161001,3 ,26.5],
[20161001,6 ,21.5],
[20161001,9 ,29.5],
[20161001,12,20.5],
[20161002,0 ,30.5],
[20161002,3 ,22.5],
[20161002,6 ,25.5]],columns=['A','B','C'])
# Create the second dataframe (df2)
df2 = pd.DataFrame(data=[25,27,29,30,25,30],columns=['Threshold'])
start = 20161001
end = 20161002
df2['Number of Rows'] = df2['Threshold'].apply(lambda x : ((df.C > x) & (df.A >= start) & (df.A <= end)).sum())
print(df2['Number of Rows'])
Out[]:
0 4
1 2
2 2
3 1
4 4
5 1
Name: Number of Rows, dtype: int64