我有以下数据框df1
:
X Y Order_ NEW_ID
0 484970.4517 408844.0920 95083 1320437
1 478512.3233 415791.5395 96478 1320727
2 504516.3032 452923.4420 105246 1321260
3 485147.0529 428172.1055 99633 1320979
另一个,df2
:
Order_ Loc
0 83158 239,211
1 83159 239,212
2 83160 239,213
3 83161 239,214
我希望与第一列合并,以便Loc
列添加正确的值df1
。要进行合并,我使用map
执行左合并,首先将Loc
值转换为字符串:
df2['Loc'] = df2['Loc'].astype(str)
df1['Loc']=df1.Order_.map(df2.Loc)
结果很奇怪,Loc
中显示的df1
值属于NaN
类型:
X Y Order_ NEW_ID Loc
0 484970.4517 408844.0920 95083 1320437 NaN
1 478512.3233 415791.5395 96478 1320727 NaN
2 504516.3032 452923.4420 105246 1321260 NaN
3 485147.0529 428172.1055 99633 1320979 NaN
虽然我希望它们是字符串并以239,211
方式出现(包含逗号的字符串)。在调查Loc
df2
中的dtype时,我得到:
Order_ int64
Loc object
dtype: object
我的问题:如何在对象之间执行类型更改,以便我能够有效地读取Loc
值,并避免它们成为NaN
}?
答案 0 :(得分:1)
我认为如果需要Order_
,我需要将int
投射到dtypes
:
df1['Order_'] = df1['Order_'].astype(int)
但也许问题是您需要按Series
或dict
制作地图,因此必须将Order_
设置为索引:
d = df2.set_index('Order_')['Loc'].to_dict()
df1['Loc']= df1.Order_.map(d)
样品:
print (df1)
X Y Order_ NEW_ID
0 484970.4517 408844.0920 95083 1320437
1 478512.3233 415791.5395 96478 1320727
2 504516.3032 452923.4420 105246 1321260
3 485147.0529 428172.1055 99633 1320979
print (df2)
Order_ Loc
0 95083 239,211 <-first value was changed for align
1 83159 239,212
2 83160 239,213
3 83161 239,214
#check if same dtypes
print (df1['Order_'].dtypes)
int64
print (df2['Order_'].dtypes)
int64
d = df2.set_index('Order_')['Loc'].to_dict()
print (d)
{83160: '239,213', 83161: '239,214', 95083: '239,211', 83159: '239,212'}
df1['Loc']= df1.Order_.map(d)
print (df1)
X Y Order_ NEW_ID Loc
0 484970.4517 408844.0920 95083 1320437 239,211
1 478512.3233 415791.5395 96478 1320727 NaN
2 504516.3032 452923.4420 105246 1321260 NaN
3 485147.0529 428172.1055 99633 1320979 NaN