尝试将一列字符串转换为float

时间:2017-06-01 20:58:24

标签: python arrays python-3.x pandas

我在数据框中有一个当前字符串的列。我需要将这些数据转换为浮点数并提取为数组,以便我可以使用坐标对。

In [55]:apt_data['geotag']

Out[55]:

 0        (40.7763, -73.9529)
 1     (40.72785, -73.983307)
 2        (40.7339, -74.0054)
 3    (40.771731, -73.956313)
 4      (40.8027, -73.949187)
Name: geotag, dtype: object'

首先我尝试了:

apt_loc = apt_data['geotag']
apt_loc_ar = np.array(apt_loc['geotag'], dtype=dt)

但是这引发了这个错误:

Traceback (most recent call last):

File "<ipython-input-60-3a853e355c9a>", line 1, in <module>
apt_loc_ar = np.array(apt_loc['geotag'], dtype=dt)

File "/python3.5/site-
packages/pandas/core/series.py", line 603, in __getitem__
result = self.index.get_value(self, key)

File "/python3.5/site-
packages/pandas/indexes/base.py", line 2169, in get_value
tz=getattr(series.dtype, 'tz', None))

File "pandas/index.pyx", line 98, in pandas.index.IndexEngine.get_value 
(pandas/index.c:3557)

File "pandas/index.pyx", line 106, in pandas.index.IndexEngine.get_value 
(pandas/index.c:3240)

File "pandas/index.pyx", line 156, in pandas.index.IndexEngine.get_loc 
(pandas/index.c:4363)

KeyError: 'geotag'

我试过用 apt_data['geotag'] = pd.to_numeric(apt_data['geotag'], errors='coerce')

这为所有参赛作品提供了NaN。

感谢。

3 个答案:

答案 0 :(得分:1)

您可以使用literal_eval模块中的ast并将功能应用于您的DataFrame,如下所示:

import pandas as pd
from ast import literal_eval as le

df = pd.DataFrame(["(40.7763, -73.9529)","(40.72785, -73.983307)"], columns=["geotag"])

df["geotag"] = df["geotag"].apply(func=lambda x: le(x))

输出:

>>> for k in df["geotag"]:
        for j in k: print(type(j))
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>

答案 1 :(得分:1)

Chiheb答案的更短版本(无需导入):

apt_data.geotag.apply(eval)

答案 2 :(得分:1)

考虑系列g

g = pd.Series(
    [
        '(40.7763, -73.9529)',
        '(40.72785, -73.983307)',
        '(40.7339, -74.0054)',
        '(40.771731, -73.956313)',
        '(40.8027, -73.949187)'
    ], name='geotag'
)

选项1
literal_eval

from ast import literal_eval
import pandas as pd

g.apply(literal_eval)

0        (40.7763, -73.9529)
1     (40.72785, -73.983307)
2        (40.7339, -74.0054)
3    (40.771731, -73.956313)
4      (40.8027, -73.949187)
Name: geotag, dtype: object

选项2
literal_eval在理解和重构中

pd.Series([literal_eval(v) for v in g.values.tolist()], g.index, name=g.name)

0        (40.7763, -73.9529)
1     (40.72785, -73.983307)
2        (40.7339, -74.0054)
3    (40.771731, -73.956313)
4      (40.8027, -73.949187)
Name: geotag, dtype: object

选项3
带有apply个功能的 str

g.apply(lambda x: [float(y) for y in x.strip('()').split(', ')])

0        [40.7763, -73.9529]
1     [40.72785, -73.983307]
2        [40.7339, -74.0054]
3    [40.771731, -73.956313]
4      [40.8027, -73.949187]
Name: geotag, dtype: object

选项4
str在理解中发挥作用

pd.Series([[float(x) for x in v.strip('()').split(', ')] for v in g.values.tolist()], g.index, name=g.name)

0        [40.7763, -73.9529]
1     [40.72785, -73.983307]
2        [40.7339, -74.0054]
3    [40.771731, -73.956313]
4      [40.8027, -73.949187]
Name: geotag, dtype: object

计时

%timeit g.apply(literal_eval)
10000 loops, best of 3: 158 µs per loop

%timeit g.apply(lambda x: [float(y) for y in x.strip('()').split(', ')])
10000 loops, best of 3: 107 µs per loop

%timeit pd.Series([literal_eval(v) for v in g.values.tolist()], g.index, name=g.name)
10000 loops, best of 3: 119 µs per loop

%timeit pd.Series([[float(x) for x in v.strip('()').split(', ')] for v in g.values.tolist()], g.index, name=g.name)
10000 loops, best of 3: 65.3 µs per loop