我有一个带有地理位置的pandas数据框,我试图创建一个列并将其传递给该列,该函数将为每个位置获取走分。
这是我的数据框:
df_test[['latitude', 'longitude']]
latitude longitude
0 50.673170 -120.322639
1 50.669597 -120.341833
2 50.650727 -120.150661
3 50.687545 -120.297688
4 50.772361 -122.811211
5 50.882304 -119.865000
6 50.643431 -120.362385
7 50.707459 -120.376297
8 50.708614 -120.409419
9 50.697850 -120.389101
10 50.659250 -119.998597
当我在单个变量上测试函数时,一切正常:
walkscore(df_test['latitude'][0], df_test['longitude'][0], key)
71
但是当我尝试通过以下方式将此函数传递给整个数据集时,出现了一个错误:
df_test.loc['walkscore'] = df_test.loc[['latitude', 'longitude']].\
apply(lambda x:
walkscore(x['latitude'], x['longitude'], apikey), axis='columns')
KeyError: "None of [Index(['latitude', 'longitude'], dtype='object')] are in the [index]"
我尝试重置索引,但没有帮助。我在这里做错什么了吗?
答案 0 :(得分:1)
删除loc
,因为需要显示列,而不是索引值:
df_test['walkscore'] = df_test.\
apply(lambda x: walkscore(x['latitude'], x['longitude'], apikey), axis='columns')
使用示例功能验证:
apikey = 'aaa'
def walkscore(x, y, apikey):
return tuple((x, y))
df_test['walkscore'] = df_test.\
apply(lambda x: walkscore(x['latitude'], x['longitude'], apikey), axis='columns')
print (df_test)
latitude longitude walkscore
0 50.673170 -120.322639 (50.67317, -120.322639)
1 50.669597 -120.341833 (50.669596999999996, -120.34183300000001)
2 50.650727 -120.150661 (50.650727, -120.15066100000001)
3 50.687545 -120.297688 (50.687545, -120.297688)
4 50.772361 -122.811211 (50.772361, -122.81121100000001)
5 50.882304 -119.865000 (50.882304, -119.865)
6 50.643431 -120.362385 (50.643431, -120.362385)
7 50.707459 -120.376297 (50.707459, -120.376297)
8 50.708614 -120.409419 (50.708614000000004, -120.409419)
9 50.697850 -120.389101 (50.69785, -120.389101)
10 50.659250 -119.998597 (50.65925, -119.998597)