包含值的逗号在python中被视为NaN read_csv

时间:2016-06-24 18:04:43

标签: python csv pandas dataframe

我编写了以下代码来读取一个看起来像这样的csv文件:

"Device","Parent Device","Sensor","Location","Time","Value","Units","Status"
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:15:00 AM","927.4","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:20:00 AM","940.3","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:25:00 AM","917.5","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:30:00 AM","1,106.4","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:35:00 AM","1,075.1","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:40:00 AM","1,078.7","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:45:00 AM","1,018.4","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:50:00 AM","1,017.3","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:55:00 AM","1,036.7","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 9:00:00 AM","995.0","kW",""

正如您所看到的,“值”部分中的某些值具有(,)并且正在读取为NaN。我尝试了以下代码:

import pandas as pd


def get_data(filename):

    #ISO-8859-1
    df=pd.read_csv(filename,thousands=",",sep=",",encoding='ISO-8859-1')

    df["Time"]=pd.to_datetime(df["Time"])
    df["Value"]=pd.to_numeric(df["Value"],errors='coerce')
    df=df.set_index(df["Time"])

    df=df[df["Time"]>="2015-05-01"]
    df=df[df["Time"]<"2016-05-01"]
    df=df[["Value"]]
    df["Value"]=pd.to_numeric(df["Value"],errors='coerce')
    # Set a conditional interpolate in the future based on dataset size and 
    #null values detected
    df=df.interpolate()
    return df

我不确定错误是否是由编码格式引起的(我也尝试了默认的'utf-8')或者我的csv文件的格式化方式。任何人都可以指出错误或替代方法。谢谢!

0 个答案:

没有答案