Question

我有以下脚本：

import pandas as pd

gdf = pd.read_csv('Geolocation_main')
print gdf['Geolocation'][:5]

哪个输出：

0    (50.673675, -120.298973)
1    (50.678354, -120.329258)
2    (50.672496, -120.333317)
3    (50.673359, -120.332912)
4     (50.673411, -120.32978)

print type(gdf['Geolocation'][0])
<type 'str'>

我需要交换每个单元格中的地理坐标，例如（-120.298973,50.673675）。

为此，我编写了以下脚本：

correct = []

for u in gdf['Geolocation']:
    u = u.replace('(', '')
    u = u.replace(')', '')
    a, b = u.split(',')
    correct = b, a
    gdf['Geolocation_correct'] = correct
    print gdf['Geolocation_correct']

但是它给了我一个错误。 ValueError：值的长度与索引的长度不匹配。我在这里做什么错了？

Answer 1

我认为您收到的错误是由于

correct = gdf['Geolocation_correct']

gdf['Geolocation_correct']在那里没有定义。

您可以这样做：

def fix_geo_location( u):
    u = u.replace('(', '')
    u = u.replace(')', '')
    a, b = u.split(',')
    correct = "(%s,%s)" %(b,a)
    return correct

df["Geolocation_correct"] = df["Geolocation"].map( fix_geo_location)

Answer 2

或者您也可以这样做：

>>> df['Geolocation'].map(lambda a: str(tuple(map(float, a.strip('()').split(',')))[::-1]))
0    (-120.298973, 50.673675)
1    (-120.329258, 50.678354)
2    (-120.333317, 50.672496)
3    (-120.332912, 50.673359)
4     (-120.32978, 50.673411)
Name: Geolocation, dtype: object

在Pandas的每一列中交换字符串元素

2 个答案: