我有一个带有GPS数据的.csv文件,如下所示:
chown -R new_owner path
最后一行的值为空或“null”。我想将数据读入数据帧并将null值设置为-1,并以float类型读取数据。使用我的代码,数据类型设置为字符串,并且不替换空值。
我是怎么做的(错误的):
chown -R xxx /opt/elasticsearch-2.3.5
测试输出的代码:
ID,GPS_LATITUDE,GPS_LONGITUDE
1,35.66727683,139.7591279
2,35.66727683,139.7591279
3,-1,-1
4,35.66750697,139.7589757
5,,139.7589757
输出:
data = r'c:\temp\gps.csv'
def conv(val):
if val == np.nan:
return -1
return val
df = pd.read_csv(data,converters={'GPS_LATITUDE':conv,'GPS_LONGITUDE':conv},dtype={'GPS_LATITUDE':np.float64,'GPS_LONGITUDE':np.float64})
答案 0 :(得分:1)
首先,您甚至不需要使用任何转换函数:
$ cat /tmp/a.csv
ID,GPS_LATITUDE,GPS_LONGITUDE
1,35.66727683,139.7591279
2,35.66727683,139.7591279
3,-1,-1
4,35.66750697,139.7589757
5,,139.7589757
In [15]: df = pd.read_csv("/tmp/a.csv", dtype={'GPS_LATITUDE':np.float64,'GPS_LONGITUDE':np.float64})
In [16]: df
Out[16]:
ID GPS_LATITUDE GPS_LONGITUDE
0 1 35.667277 139.759128
1 2 35.667277 139.759128
2 3 -1.000000 -1.000000
3 4 35.667507 139.758976
4 5 NaN 139.758976
In [18]: df.dtypes
Out[18]:
ID int64
GPS_LATITUDE float64
GPS_LONGITUDE float64
dtype: object
In [19]: df.fillna(-1, inplace = True)
In [20]: df
Out[20]:
ID GPS_LATITUDE GPS_LONGITUDE
0 1 35.667277 139.759128
1 2 35.667277 139.759128
2 3 -1.000000 -1.000000
3 4 35.667507 139.758976
4 5 -1.000000 139.758976
其次,如果您确实要使用conv,请将其更改为(如果您对所有列使用conv,则无需指定dtype):
In [21]: def conv(val):
....: if not val:
....: return -1
....: return np.float64(val)
....:
In [22]: df = pd.read_csv("/tmp/a.csv", converters={'GPS_LATITUDE':conv,'GPS_LONGITUDE':conv})
In [23]: df
Out[23]:
ID GPS_LATITUDE GPS_LONGITUDE
0 1 35.667277 139.759128
1 2 35.667277 139.759128
2 3 -1.000000 -1.000000
3 4 35.667507 139.758976
4 5 -1.000000 139.758976
In [24]: df.dtypes
Out[24]:
ID int64
GPS_LATITUDE float64
GPS_LONGITUDE float64
dtype: object
在任何一种情况下:
In [26]: lats = df['GPS_LATITUDE'].tolist()
In [27]: for l in lats:
....: print(l,type(l))
....:
(35.667276829999999, <type 'numpy.float64'>)
(35.667276829999999, <type 'numpy.float64'>)
(-1.0, <type 'numpy.float64'>)
(35.667506969999998, <type 'numpy.float64'>)
(-1.0, <type 'numpy.float64'>)