为什么在熊猫中不能正确使用“ na_values”?

时间:2019-06-30 15:15:51

标签: python pandas

我正在阅读this file,并将-1替换为nan

import pandas as pd
import os

path = "./data/"

filename = os.path.join(path,"SN_d_tot_V2.0.csv")    
names = ['year', 'month', 'day', 'dec_year', 'sn_value' , 'sn_error', 'obs_num']
df = pd.read_csv(filename,sep=';',header=None,names=names,na_values=['-1'], index_col=False)

但是替换仅适用于sn_error的{​​{1}}列,不适用于float64的{​​{1}}。怎么了?如何替换所有sn_value值?

enter image description here

2 个答案:

答案 0 :(得分:1)

该列中有前导空格,您在阅读CSV时需要将其删除。

df = pd.read_csv('http://www.sidc.be/silso/INFO/sndtotcsv.php', 
                 sep=r'\s*;\s*', 
                 engine='python', 
                 header=None, 
                 names=names,
                 na_values=[-1], 
                 index_col=False)
df.head()

   year  month  day  dec_year  sn_value  sn_error  obs_num
0  1818      1    1  1818.001       NaN       NaN        0
1  1818      1    2  1818.004       NaN       NaN        0
2  1818      1    3  1818.007       NaN       NaN        0
3  1818      1    4  1818.010       NaN       NaN        0
4  1818      1    5  1818.012       NaN       NaN        0

答案 1 :(得分:0)

使用转换器参数

df = pd.read_csv(filename,sep=';',header=None,names=names,converters={'sn_value':float,'sn_error':float}, na_values=['-1'], index_col=False)