Question

我有以下两个数据框：

DF1 =

 Inflow
0  9810998109
1  5591255912
2  7394273942
3  7866678666
4  1820118202
5  9812198109
6  9810998101
7  4304043040
8  9810998121

DF2 =

       Inflow  mi_to_zcta5
0  3371433756    11.469054
1  1790118201    24.882142

我想做一个操作，可以在“流入”列中合并这两个数据框。有点像尝试重新创建具有近似匹配的VLookUp Excel函数（如this question中所示）。但是我每次都会失败。我一直试图使用的这一行是这一行：

test = pd.merge_asof(DF1, DF2, on = 'mi_to_zcta5')

我尝试使用其他设置，例如将'allow_exact_matches'设置为'False'，但没有成功。

这是我得到的错误：

 return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5280)
  File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5126)
  File "pandas\_libs\hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20523)
  File "pandas\_libs\hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20477)
KeyError: 'mi_to_zcta5'

我想获得一个包含10行的数据框，其中的“流入”列和附加的“ mi_to_zcta5”列具有对应的最接近值（如果可能）。就像使用近似匹配在excel中的VLookUp中一样。

谢谢！

Answer 1

这是您的解决方案：

在第一个dataFrame（df1）中只有一列，而Second（df2）有两个列，而在进行pd.merge时，必须选择outer，这是键的并集。这意味着将显示所有索引，并且在缺少col的地方将其保留为NaN。

>>> df1
       Inflow
0  9810998109
1  5591255912
2  7394273942
3  7866678666
4  1820118202
5  9812198109
6  9810998101
7  4304043040
8  9810998121
>>> df2
       Inflow  mi_to_zcta5
0  3371433756    11.469054
1  1790118201    24.882142
>>>
>>>
>>>
>>> pd.merge( df1, df2, on=['Inflow'], how='outer')
        Inflow  mi_to_zcta5
0   9810998109          NaN
1   5591255912          NaN
2   7394273942          NaN
3   7866678666          NaN
4   1820118202          NaN
5   9812198109          NaN
6   9810998101          NaN
7   4304043040          NaN
8   9810998121          NaN
9   3371433756    11.469054
10  1790118201    24.882142

注意：您无法在密钥'mi_to_zcta5上合并，因为密钥df上不存在

使用来自Pandas的merge_asof的问题

1 个答案: