我有两个要合并的数据框。它们如下所示:
df_1
unit start_time stop_time
A 0.0 1.2
B 1.3 4.1
A 4.2 4.5
B 4.6 7.2
A 7.3 8.0
df_2
time other_data
0.2 .0122
0.4 .0128
0.6 .0101
0.8 .0091
1.0 .2122
1.2 .1542
1.4 .1546
1.6 .1522
1.8 .2542
2.0 .1557
2.2 .2542
2.4 .1543
2.6 .0121
2.8 .0111
3.0 .0412
3.2 .0214
3.4 .0155
3.6 .0159
3.8 .0154
4.0 .0155
4.2 .0211
4.4 .0265
4.6 .0146
4.8 .0112
5.0 .0166
5.2 .0101
5.4 .0132
5.6 .0112
5.8 .0121
6.0 .0142
6.2 .0124
6.4 .0111
6.6 .0123
6.8 .0111
6.0 .0119
6.2 .0112
6.4 .0131
6.6 .0117
6.8 .0172
7.0 .0123
7.2 .0127
7.4 .0121
7.6 .0110
7.8 .0120
8.0 .0121
我要使用以下条件合并这些数据框:
我想对df_2.other_data中的所有值进行分组,其中df_2.time在df_1.start_time和df_1.stop_time之间。例如,对于df_1的第一行,来自df_2的以下数据将被分组:
time other_data
0.2 .0122
0.4 .0128
0.6 .0101
0.8 .0091
1.0 .2122
1.2 .1542
在此组中,我想计算df_2.other_data高于阈值的观察总数,在这种情况下,该阈值将设置为.0120。该组中超过此阈值的观察总数为4。这是我要合并到df_1的值。结果应如下所示:
unit start_time stop_time other_data_above_threshold
A 0.0 1.2 4
最终数据框应如下所示:
unit start_time stop_time other_data_above_threshold
A 0.0 1.2 4
B 1.3 4.1 13
A 4.2 4.5 3
B 4.6 7.2 11
A 7.3 8.0 4
答案 0 :(得分:1)
IIUC,这就是您所需要的。
df['other_data_at'] = df.apply(lambda x: df2.loc[(df2['time']>= x['start_time']) & (df2['time']<= x['stop_time'])].loc[df2['other_data']>=0.012].count()[0], axis=1)
输出
unit start_time stop_time other_data_at
0 A 0.0 1.2 4
1 B 1.3 4.1 13
2 A 4.2 4.5 2 #you expected output shows 3 but it should be 2
3 B 4.6 7.2 11
4 A 7.3 8.0 3
答案 1 :(得分:0)
嗨,我会尝试遍历您的df1并将其值用于df2
看起来有点像这样:
def my_counting(df1, df2, threshold):
count_list = ()
for index,unit in enumerate(df['unit']):
df = df2[(df2['time'] >= df1['start_time'][index]) & (df2['time'] < df1['stop_time'][index])]
count_list.append(df[df['other_data'] <= threshold].shape[0])
df1['other_data_above_threshold'] = count_list
return df1
print(my_counting(df1, df2, 0.012)
答案 2 :(得分:0)
您可以尝试使用pd.cut
a = df_1.start_time.to_list() + [np.inf]
s = pd.cut(df_2.time, bins=a, labels=df_1.index, right=False)
df_1['other_data_above_threshold'] = df_2.other_data.gt(0.012).groupby(s).sum()
Out[213]:
unit start_time stop_time other_data_above_threshold
0 A 0.0 1.2 4.0
1 B 1.3 4.1 13.0
2 A 4.2 4.5 2.0
3 B 4.6 7.2 11.0
4 A 7.3 8.0 2.0