Question

Lon_X        Lat_Y
5,234234     6,3234234
5,234234     6,3234234
5,234234     6,3234234
5,234234     6,3234234
5,234234     6,3234234

我在上面的pandas / dataframe中有GPS坐标。然而，这些使用逗号分隔符。使用pandas将这些转换为浮动GPS坐标的最佳方法是什么？

for item in frame.Lon_X:
    float(item.replace(",", ".")) # makes the conversion but does not store it back

我已尝试过iteritems功能，但似乎很慢并且给了我一个警告，我不太明白：

for index, value in frame.Lon_X.iteritems():
    frame.Lon_X[index] = float(value.replace(",", "."))

请参阅文档中的警告： http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy 来自ipykernel导入kernelapp作为app

Answer 1

您可以沿着轴原地应用熊猫的矢量化方法：

def to_float_inplace(x):
    x[:] = x.str.replace(',', '.').astype(float)

df.apply(to_float_inplace)

Answer 2

试试这个：

df.applymap(lambda x: float(x.replace(",", ".")))

编辑：忘记map，因为@Psidom显示

Answer 3

您可以使用applymap：

df[["Lon_X", "Lat_Y"]] = df[["Lon_X", "Lat_Y"]].applymap(lambda x: float(x.replace(",", ".")))
df

以下是关于这些替代方案的一些基准，to_float_inplace明显快于所有其他方法：

数据：

df = pd.DataFrame({"Lon_X": ["5,234234" for i in range(1000000)], "Lat_Y": ["6,3234234" for i in range(1000000)]})

# to_float_inplace
def to_float_inplace(x):
    x[:] = x.str.replace(',', '.').astype(float)

%timeit df.apply(to_float_inplace)
# 1 loops, best of 3: 269 ms per loop

# applymap + astype
%timeit df.applymap(lambda x: x.replace(",", ".")).astype(float)
# 1 loops, best of 3: 1.26 s per loop

# to_float
def to_float(x):
    return x.str.replace(',', '.').astype(float)

%timeit df.apply(to_float)
# 1 loops, best of 3: 1.47 s per loop

# applymap + float
%timeit df.applymap(lambda x: float(x.replace(",", ".")))
# 1 loops, best of 3: 1.75 s per loop

# replace with regex
%timeit df.replace(',', '.', regex=True).astype(float)
# 1 loops, best of 3: 1.79 s per loop

Answer 4

您可以跳过使用“应用”并直接使用replace <{1}}方法替换regex=True

df.replace(',', '.', regex=True).astype(float)

Answer 5

令人惊讶的是，迭代np系列似乎更快，而不是使用pd.series.str.replace。我用2米行系列进行了以下实验

setup = '''
import pandas as pd
import numpy as np
a = pd.Series(list('aabc') * 500000)
b = a.values.astype(str)
'''

a = '''
a[:] = a.str.replace("b", "d")
'''
b = '''
b[:] = np.char.replace(b, "b", "d")
'''
c = '''
for i, x in enumerate(b):
    if "b" in x:
        b[i] = "d"
'''
a_speed = min(timeit.Timer(a, setup=setup).repeat(7, 5))
b_speed = min(timeit.Timer(b, setup=setup).repeat(7, 5))
c_speed = min(timeit.Timer(c, setup=setup).repeat(7, 5))

结果：

a_speed = 2.3304627019997497

b_speed = 6.832672896000076

c_speed = 1.9407824309996613

替换数据框Python中的值

5 个答案: