Lon_X Lat_Y
5,234234 6,3234234
5,234234 6,3234234
5,234234 6,3234234
5,234234 6,3234234
5,234234 6,3234234
我在上面的pandas / dataframe中有GPS坐标。然而,这些使用逗号分隔符。使用pandas将这些转换为浮动GPS坐标的最佳方法是什么?
for item in frame.Lon_X:
float(item.replace(",", ".")) # makes the conversion but does not store it back
我已尝试过iteritems功能,但似乎很慢并且给了我一个警告,我不太明白:
for index, value in frame.Lon_X.iteritems():
frame.Lon_X[index] = float(value.replace(",", "."))
请参阅文档中的警告: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy 来自ipykernel导入kernelapp作为app
答案 0 :(得分:1)
您可以沿着轴原地应用熊猫的矢量化方法:
def to_float_inplace(x):
x[:] = x.str.replace(',', '.').astype(float)
df.apply(to_float_inplace)
答案 1 :(得分:0)
试试这个:
df.applymap(lambda x: float(x.replace(",", ".")))
编辑:忘记map
,因为@Psidom显示
答案 2 :(得分:0)
您可以使用applymap:
df[["Lon_X", "Lat_Y"]] = df[["Lon_X", "Lat_Y"]].applymap(lambda x: float(x.replace(",", ".")))
df
以下是关于这些替代方案的一些基准,to_float_inplace
明显快于所有其他方法:
数据:
df = pd.DataFrame({"Lon_X": ["5,234234" for i in range(1000000)], "Lat_Y": ["6,3234234" for i in range(1000000)]})
# to_float_inplace
def to_float_inplace(x):
x[:] = x.str.replace(',', '.').astype(float)
%timeit df.apply(to_float_inplace)
# 1 loops, best of 3: 269 ms per loop
# applymap + astype
%timeit df.applymap(lambda x: x.replace(",", ".")).astype(float)
# 1 loops, best of 3: 1.26 s per loop
# to_float
def to_float(x):
return x.str.replace(',', '.').astype(float)
%timeit df.apply(to_float)
# 1 loops, best of 3: 1.47 s per loop
# applymap + float
%timeit df.applymap(lambda x: float(x.replace(",", ".")))
# 1 loops, best of 3: 1.75 s per loop
# replace with regex
%timeit df.replace(',', '.', regex=True).astype(float)
# 1 loops, best of 3: 1.79 s per loop
答案 3 :(得分:0)
您可以跳过使用“应用”并直接使用replace
<{1}}方法替换regex=True
df.replace(',', '.', regex=True).astype(float)
答案 4 :(得分:0)
令人惊讶的是,迭代np系列似乎更快,而不是使用pd.series.str.replace。我用2米行系列进行了以下实验
setup = '''
import pandas as pd
import numpy as np
a = pd.Series(list('aabc') * 500000)
b = a.values.astype(str)
'''
a = '''
a[:] = a.str.replace("b", "d")
'''
b = '''
b[:] = np.char.replace(b, "b", "d")
'''
c = '''
for i, x in enumerate(b):
if "b" in x:
b[i] = "d"
'''
a_speed = min(timeit.Timer(a, setup=setup).repeat(7, 5))
b_speed = min(timeit.Timer(b, setup=setup).repeat(7, 5))
c_speed = min(timeit.Timer(c, setup=setup).repeat(7, 5))
结果:
a_speed = 2.3304627019997497
b_speed = 6.832672896000076
c_speed = 1.9407824309996613