当两个点彼此在+2.75和-2.75之间时,我需要将它们合并在一起。这两个点位于两个单独的数据帧中(均在管道的较早步骤中进行处理)。
在这种情况下,我认为merge_asof可以为我提供2.75的公差和“最近”方向。
但是,我得到一个错误:
MergeError:键必须为整数,时间戳或浮点数
这是两个数据框之一:
Unnamed: 0 Section_id Section_location
36015 36015 055_305AR_10.8 397.0
7344 7344 055_305AR_10.8 659.0
现在我有了第二个数据框,其中也有Section_id和section_locations,如402.5。因此,如果在此示例中,第二个数据帧的section_location大于或等于394.25且小于或等于399.75,我想合并。
我还用section_id和section_location对两个数据框的值进行了排序。
我尝试了以下代码,但出现了错误。
def mergeasof_dfs(df1, df2):
return pd.merge_asof(left = df1, right = df2,
on='Section_id',
by='Section_location',
tolerance = 2.75,
direction = 'nearest'
)
---------------------------------------------------------------------------
MergeError Traceback (most recent call last)
<ipython-input-66-641a0dfae9af> in <module>
----> 1 test = mergeasof_dfs(df1, df2)
<ipython-input-65-bc88146fa086> in mergeasof_dfs(df1, df2)
5 by='Section_location',
6 tolerance = 2.75,
----> 7 direction = 'nearest'
8 )
错误:
MergeError:键必须为整数,时间戳或浮点数
答案 0 :(得分:0)
一个可行的解决方案是创建用于合并的辅助整数列-首先由concat
使用DataFrame
参数创建keys
并由factorize
创建整数的新列:
df1 = pd.DataFrame({
'Section_location':list('abcymdc'),
})
df2 = pd.DataFrame({
'Section_location':list('abhucda'),
})
df3 = pd.concat([df1[['Section_location']],df2[['Section_location']]], keys=('df1','df2'))
df3['Section_id_new'] = pd.factorize(df3['Section_location'])[0]
print (df3)
Section_location Section_id_new
df1 0 a 0
1 b 1
2 c 2
3 y 3
4 m 4
5 d 5
6 c 2
df2 0 a 0
1 b 1
2 h 6
3 u 7
4 c 2
5 d 5
6 a 0
df1['Section_id_new'] = df3.loc['df1', 'Section_id_new']
print (df1)
df2['Section_id_new'] = df3.loc['df2', 'Section_id_new']
print (df2)
Section_location Section_id_new
0 a 0
1 b 1
2 c 2
3 y 3
4 m 4
5 d 5
6 c 2
Section_location Section_id_new
0 a 0
1 b 1
2 h 6
3 u 7
4 c 2
5 d 5
6 a 0
所以您的解决方案是
def mergeasof_dfs(df1, df2):
df3 = pd.concat([df1[['Section_location']],df2[['Section_location']]], keys=('df1','df2'))
df3['Section_id_new'] = pd.factorize(df3['Section_location'])[0]
df1['Section_id_new'] = df3.loc['df1', 'Section_id_new']
df2['Section_id_new'] = df3.loc['df2', 'Section_id_new']
df = pd.merge_asof(left = df1, right = df2,
on='Section_id_new',
by='Section_location',
tolerance = 2.75,
direction = 'nearest'
)
return df.drop('Section_id_new', axis=1)