Question

当两个点彼此在+2.75和-2.75之间时，我需要将它们合并在一起。这两个点位于两个单独的数据帧中（均在管道的较早步骤中进行处理）。

在这种情况下，我认为merge_asof可以为我提供2.75的公差和“最近”方向。

但是，我得到一个错误：

MergeError：键必须为整数，时间戳或浮点数

这是两个数据框之一：

    Unnamed: 0  Section_id  Section_location    
36015   36015   055_305AR_10.8  397.0   
7344    7344    055_305AR_10.8  659.0

现在我有了第二个数据框，其中也有Section_id和section_locations，如402.5。因此，如果在此示例中，第二个数据帧的section_location大于或等于394.25且小于或等于399.75，我想合并。

我还用section_id和section_location对两个数据框的值进行了排序。

我尝试了以下代码，但出现了错误。

 def mergeasof_dfs(df1, df2):

    return pd.merge_asof(left = df1, right = df2, 
                         on='Section_id', 
                         by='Section_location',
                         tolerance = 2.75,
                         direction = 'nearest'
                        )

---------------------------------------------------------------------------
MergeError                                Traceback (most recent call last)
<ipython-input-66-641a0dfae9af> in <module>
----> 1 test = mergeasof_dfs(df1, df2)

<ipython-input-65-bc88146fa086> in mergeasof_dfs(df1, df2)
      5                          by='Section_location',
      6                          tolerance = 2.75,
----> 7                          direction = 'nearest'
      8                         )

错误：

MergeError：键必须为整数，时间戳或浮点数

Answer 1

一个可行的解决方案是创建用于合并的辅助整数列-首先由concat使用DataFrame参数创建keys并由factorize创建整数的新列：

df1 = pd.DataFrame({
        'Section_location':list('abcymdc'),

})
df2 = pd.DataFrame({
        'Section_location':list('abhucda'),

})


df3 = pd.concat([df1[['Section_location']],df2[['Section_location']]], keys=('df1','df2'))
df3['Section_id_new'] = pd.factorize(df3['Section_location'])[0]
print (df3)
      Section_location  Section_id_new
df1 0                a               0
    1                b               1
    2                c               2
    3                y               3
    4                m               4
    5                d               5
    6                c               2
df2 0                a               0
    1                b               1
    2                h               6
    3                u               7
    4                c               2
    5                d               5
    6                a               0

df1['Section_id_new'] = df3.loc['df1', 'Section_id_new']
print (df1)
df2['Section_id_new'] = df3.loc['df2', 'Section_id_new']
print (df2)
  Section_location  Section_id_new
0                a               0
1                b               1
2                c               2
3                y               3
4                m               4
5                d               5
6                c               2
  Section_location  Section_id_new
0                a               0
1                b               1
2                h               6
3                u               7
4                c               2
5                d               5
6                a               0

所以您的解决方案是

 def mergeasof_dfs(df1, df2):

    df3 = pd.concat([df1[['Section_location']],df2[['Section_location']]], keys=('df1','df2'))
    df3['Section_id_new'] = pd.factorize(df3['Section_location'])[0]
    df1['Section_id_new'] = df3.loc['df1', 'Section_id_new']
    df2['Section_id_new'] = df3.loc['df2', 'Section_id_new']

    df = pd.merge_asof(left = df1, right = df2, 
                         on='Section_id_new', 
                         by='Section_location',
                         tolerance = 2.75,
                         direction = 'nearest'
                        )
    return df.drop('Section_id_new', axis=1)

合并asof错误：MergeError：键必须为整数，时间戳或浮点数

1 个答案: