Question

我正在做一些地理编码工作，我使用selenium来屏蔽我需要的xy坐标，我需要一个位置的地址，我将xls文件导入到panda数据帧，并希望使用显式循环来更新行没有xy坐标，如下所示：

for index, row in rche_df.iterrows():
    if isinstance(row.wgs1984_latitude, float):
        row = row.copy()
        target = row.address_chi        
        dict_temp = geocoding(target)
        row.wgs1984_latitude = dict_temp['lat']
        row.wgs1984_longitude = dict_temp['long']

我已阅读Why doesn't this function "take" after I iterrows over a pandas DataFrame?并且我完全清楚iterrow只提供了一个视图而不是一个副本进行编辑，但如果我真的要逐行更新值呢？ lambda可行吗？

Answer 1

从iterrows返回的行是不再与原始数据框相关联的副本，因此编辑不会更改您的数据框。值得庆幸的是，因为从iterrows返回的每个项目都包含当前索引，您可以使用它来访问和编辑数据框的相关行：

for index, row in rche_df.iterrows():
    if isinstance(row.wgs1984_latitude, float):
        row = row.copy()
        target = row.address_chi        
        dict_temp = geocoding(target)
        rche_df.loc[index, 'wgs1984_latitude'] = dict_temp['lat']
        rche_df.loc[index, 'wgs1984_longitude'] = dict_temp['long']

根据我的经验，这种方法似乎比使用像apply或map这样的方法要慢，但一如既往，由您来决定如何使性能/易用性达到最佳状态编码权衡。

在iterrow中为pandas更新值

1 个答案: