我在数据框中有一个当前字符串的列。我需要将这些数据转换为浮点数并提取为数组,以便我可以使用坐标对。
In [55]:apt_data['geotag']
Out[55]:
0 (40.7763, -73.9529)
1 (40.72785, -73.983307)
2 (40.7339, -74.0054)
3 (40.771731, -73.956313)
4 (40.8027, -73.949187)
Name: geotag, dtype: object'
首先我尝试了:
apt_loc = apt_data['geotag']
apt_loc_ar = np.array(apt_loc['geotag'], dtype=dt)
但是这引发了这个错误:
Traceback (most recent call last):
File "<ipython-input-60-3a853e355c9a>", line 1, in <module>
apt_loc_ar = np.array(apt_loc['geotag'], dtype=dt)
File "/python3.5/site-
packages/pandas/core/series.py", line 603, in __getitem__
result = self.index.get_value(self, key)
File "/python3.5/site-
packages/pandas/indexes/base.py", line 2169, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas/index.pyx", line 98, in pandas.index.IndexEngine.get_value
(pandas/index.c:3557)
File "pandas/index.pyx", line 106, in pandas.index.IndexEngine.get_value
(pandas/index.c:3240)
File "pandas/index.pyx", line 156, in pandas.index.IndexEngine.get_loc
(pandas/index.c:4363)
KeyError: 'geotag'
我试过用
apt_data['geotag'] = pd.to_numeric(apt_data['geotag'], errors='coerce')
这为所有参赛作品提供了NaN。
感谢。
答案 0 :(得分:1)
您可以使用literal_eval
模块中的ast
并将功能应用于您的DataFrame,如下所示:
import pandas as pd
from ast import literal_eval as le
df = pd.DataFrame(["(40.7763, -73.9529)","(40.72785, -73.983307)"], columns=["geotag"])
df["geotag"] = df["geotag"].apply(func=lambda x: le(x))
输出:
>>> for k in df["geotag"]:
for j in k: print(type(j))
<class 'float'>
<class 'float'>
<class 'float'>
<class 'float'>
答案 1 :(得分:1)
Chiheb答案的更短版本(无需导入):
apt_data.geotag.apply(eval)
答案 2 :(得分:1)
考虑系列g
g = pd.Series(
[
'(40.7763, -73.9529)',
'(40.72785, -73.983307)',
'(40.7339, -74.0054)',
'(40.771731, -73.956313)',
'(40.8027, -73.949187)'
], name='geotag'
)
选项1
literal_eval
from ast import literal_eval
import pandas as pd
g.apply(literal_eval)
0 (40.7763, -73.9529)
1 (40.72785, -73.983307)
2 (40.7339, -74.0054)
3 (40.771731, -73.956313)
4 (40.8027, -73.949187)
Name: geotag, dtype: object
选项2
literal_eval
在理解和重构中
pd.Series([literal_eval(v) for v in g.values.tolist()], g.index, name=g.name)
0 (40.7763, -73.9529)
1 (40.72785, -73.983307)
2 (40.7339, -74.0054)
3 (40.771731, -73.956313)
4 (40.8027, -73.949187)
Name: geotag, dtype: object
选项3
带有apply
个功能的 str
g.apply(lambda x: [float(y) for y in x.strip('()').split(', ')])
0 [40.7763, -73.9529]
1 [40.72785, -73.983307]
2 [40.7339, -74.0054]
3 [40.771731, -73.956313]
4 [40.8027, -73.949187]
Name: geotag, dtype: object
选项4
str
在理解中发挥作用
pd.Series([[float(x) for x in v.strip('()').split(', ')] for v in g.values.tolist()], g.index, name=g.name)
0 [40.7763, -73.9529]
1 [40.72785, -73.983307]
2 [40.7339, -74.0054]
3 [40.771731, -73.956313]
4 [40.8027, -73.949187]
Name: geotag, dtype: object
计时
%timeit g.apply(literal_eval)
10000 loops, best of 3: 158 µs per loop
%timeit g.apply(lambda x: [float(y) for y in x.strip('()').split(', ')])
10000 loops, best of 3: 107 µs per loop
%timeit pd.Series([literal_eval(v) for v in g.values.tolist()], g.index, name=g.name)
10000 loops, best of 3: 119 µs per loop
%timeit pd.Series([[float(x) for x in v.strip('()').split(', ')] for v in g.values.tolist()], g.index, name=g.name)
10000 loops, best of 3: 65.3 µs per loop