Question

通过它最接近的索引加入pandas DataFrame值是否有快速而好的做法？我必须为大型数据帧执行此操作，并且我已经尝试过我的黑客和解决方案，这些都很慢，因此不是很有用。

假设我有两个数据框df和df2。现在，我想将df2的值加入df，关于它的最近/最近的索引。

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randint(0,100,size=(4, 6)), 
                index=[1,1.55,3.33,9.88], 
                columns=[1,2.66,4.66,8.33,11.11,12])

df2 = pd.DataFrame(np.random.randint(0,100,size=(2, 3)), 
                index=[1.51,3.31], 
                columns=[2.64,4.65,8.31])

In [23]: df
Out[23]:

         1.00   2.66   4.66   8.33   11.11  12.00
1.00     98     40     28     36     49     92
1.55     52     51     61     64     28     98
3.33     66     33     91     21     24     79
9.88     30     21     13     62     89     22

In [24]: df2
Out[24]:

      2.64  4.65  11.12
1.51   999   999   999
3.31   999   999   999

# The result should look like the following:

         1.00   2.66   4.66   8.33   11.11  12.00
1.00     98     40     28     36     49     92
1.55     52     999    999    55     999    98
3.33     66     999    999    67     999    79
9.88     30     21     13     62     89     22

Answer 1

<强> 设置
因为OP数据帧不一致

df = pd.DataFrame(
    1,
    index=[1,1.55,3.33,9.88],
    columns=[1,2.66,4.66,8.33,11.11,12])

df2 = pd.DataFrame(
    999,
    index=[1.51,3.31],
    columns=[2.64,4.65,8.31])

print(df)

      1.00   2.66   4.66   8.33   11.11  12.00
1.00      1      1      1      1      1      1
1.55      1      1      1      1      1      1
3.33      1      1      1      1      1      1
9.88      1      1      1      1      1      1

print(df2)

      2.64  4.65  8.31
1.51   999   999   999
3.31   999   999   999

棘手我没有时间解释。 Docs

kw = dict(method='nearest', tolerance=.3)
df2.reindex(df.index, **kw).T.reindex(df.columns, **kw).T.combine_first(df)

      1.00   2.66   4.66   8.33   11.11  12.00
1.00    1.0    1.0    1.0    1.0    1.0    1.0
1.55    1.0  999.0  999.0  999.0    1.0    1.0
3.33    1.0  999.0  999.0  999.0    1.0    1.0
9.88    1.0    1.0    1.0    1.0    1.0    1.0

我宁愿这样做

df2.stack().reindex_like(df.stack(), **kw)

但我得到了：

NotImplementedError：method =＆＃39; nearest＆＃39;尚未实现MultiIndex;见GitHub issue 9365

至少它将来会在某个时候出现。

通过最接近的索引加入pandas DataFrame值

1 个答案: