我有两个数据框,如下所示:
result1
time browncarbon blackcarbon
180.7452 0.506824055392119 0.4693240205237933
180.748 0.5040641475588111 0.4671092323195378
180.7508 0.49911820575405846 0.46344714546409305
180.7535 0.4957944583911674 0.46030629341216533
180.7563 0.4888745617073804 0.45557451231658985
180.7591 0.4864626914800723 0.45633142113414893
180.7619 0.48328511735148877 0.4548510376145042
180.7646 0.484728828747634 0.4572818652186026
180.7674 0.4840750981022636 0.45772491443336777
180.7702 0.4843291425046101 0.4588332952196751
422 rows x 3 columns
result2
start end toc
180.7452 180.7466 192.0
180.7438 180.7452 194.0
180.7424 180.7438 199.0
180.741 180.7424 208.0
180.7396 180.741 229.0
180.7383 180.7396 245.0
180.7369 180.7383 252.0
180.7355 180.7369 245.0
180.7341 180.7355 238.0
180.7327 180.7341 245.0
1364 rows x 3 columns
封装到时间行之一中的多个开始行和结束行也应对应于一个toc行,这应该是多个toc行的平均值。我怎么做?堆栈溢出有一个相关的答案。链接为:Merging two pandas dataframes with complex conditions
result3
result1['rank'] = np.arange(length1)
result3=pd.merge_asof(result1.sort_values('time'),result2,left_on='time',right_on='start')
result3.sort_values('rank').drop(['rank','start','end'], axis=1)
time browncarbon blackcarbon toc
180.7452 0.506824055392119 0.4693240205237933
180.748 0.5040641475588111 0.4671092323195378
180.7508 0.49911820575405846 0.46344714546409305
180.7535 0.4957944583911674 0.46030629341216533
180.7563 0.4888745617073804 0.45557451231658985
180.7591 0.4864626914800723 0.45633142113414893
180.7619 0.48328511735148877 0.4548510376145042
180.7646 0.484728828747634 0.4572818652186026
180.7674 0.4840750981022636 0.45772491443336777
180.7702 0.4843291425046101 0.4588332952196751
422 rows X 4 columns
答案 0 :(得分:0)
对所有行组合使用交叉联接,然后用boolean indexing
和Series.between
进行过滤并汇总mean
,最后DataFrame.join
到原始:
df = result1.assign(a=1).merge(result2.assign(a=1), on='a', how='outer')
s=df[df['time'].between(df['start'],df['end'])].groupby(result1.columns.tolist())['toc'].mean()
df = result1.join(s, result1.columns.tolist())
print (df)
time browncarbon blackcarbon toc
0 180.7452 0.506824 0.469324 193.0
1 180.7480 0.504064 0.467109 NaN
2 180.7508 0.499118 0.463447 NaN
3 180.7535 0.495794 0.460306 NaN
4 180.7563 0.488875 0.455575 NaN
5 180.7591 0.486463 0.456331 NaN
6 180.7619 0.483285 0.454851 NaN
7 180.7646 0.484729 0.457282 NaN
8 180.7674 0.484075 0.457725 NaN
9 180.7702 0.484329 0.458833 NaN
答案 1 :(得分:0)
jezrael的答案很好,但我要补充一点,即按可能具有NaN值的列分组将删除这些记录。我只会按time
分组,然后将结果序列放到一个新的数据框中:
df_aux = result1.assign(a=1).merge(result2.assign(a=1), on='a', how='outer')
series_aux = df[df['time'].between(df['start'],df['end'])].groupby('time')['toc'].mean()
这将返回一个熊猫系列,然后您可以将其与要保留的result1
中的任何数据合并。