我有2个pandas数据帧:
DF1:
ksat muacres SAND SILT CLAY
0 5326 0 0 0
0.1 4346 0 0 0
0.4 4146 0 0 0
0.8 3476 0 0 0
1.2 2006 0 0 0
和, DF2:
PERCENTILE ksat b theta
0 1 0.370684 11.55 46.8
1 2 0.558053 11.55 46.8
2 3 0.794836 10.39 46.8
3 4 0.962329 11.55 46.8
4 5 1.202368 10.39 46.8
我想在df1中添加一个列'st',其中对于df1中的每一行,我在df2中找到ksat值,它大于或等于df1中的ksat值。对于此示例,结果将是:
DF1:
ksat muacres SAND SILT CLAY st
0 5326 0 0 0 1
0.1 4346 0 0 0 1
0.4 4146 0 0 0 2
0.8 3476 0 0 0 4
1.2 2006 0 0 0 5
目前,我在循环中循环,但效率非常低。大熊猫有更好的方法吗?
谢谢!
答案 0 :(得分:2)
一种方法是合并两次。首先只有百分位列,以便您可以向后填充:
In [11]: merged = df1[['ksat']].merge(df2[['ksat', 'PERCENTILE']], how='outer', sort=True)
In [12]: merged
Out[12]:
ksat PERCENTILE
0 0.000000 NaN
1 0.100000 NaN
2 0.370684 1
3 0.400000 NaN
4 0.558053 2
5 0.794836 3
6 0.800000 NaN
7 0.962329 4
8 1.200000 NaN
9 1.202368 5
In [13]: merged.bfill()
Out[13]:
ksat PERCENTILE
0 0.000000 1
1 0.100000 1
2 0.370684 1
3 0.400000 2
4 0.558053 2
5 0.794836 3
6 0.800000 4
7 0.962329 4
8 1.200000 5
9 1.202368 5
然后合并此结果:
In [14]: df.merge(merged.bfill())
Out[14]:
ksat muacres SAND SILT CLAY PERCENTILE
0 0.0 5326 0 0 0 1
1 0.1 4346 0 0 0 1
2 0.4 4146 0 0 0 2
3 0.8 3476 0 0 0 4
4 1.2 2006 0 0 0 5
答案 1 :(得分:2)
您可以尝试numpy.searchsorted
df1['st'] = np.searchsorted(df2.ksat, df1.ksat, side='left') + 1
如果PERCENTILE
值不是序数,那么还有一个额外的步骤:
idx = np.searchsorted(df2.ksat, df1.ksat, side='left')
df1['st'] = df2.PERCENTILE[idx].values