我编写了以下代码来读取一个看起来像这样的csv文件:
"Device","Parent Device","Sensor","Location","Time","Value","Units","Status"
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:15:00 AM","927.4","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:20:00 AM","940.3","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:25:00 AM","917.5","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:30:00 AM","1,106.4","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:35:00 AM","1,075.1","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:40:00 AM","1,078.7","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:45:00 AM","1,018.4","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:50:00 AM","1,017.3","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 8:55:00 AM","1,036.7","kW",""
"Building - Total Power - PM870","BOCDCE (StruxureWare Data Center Expert)","Building Total Power - PM870","","Jul 30, 2015 9:00:00 AM","995.0","kW",""
正如您所看到的,“值”部分中的某些值具有(,)并且正在读取为NaN。我尝试了以下代码:
import pandas as pd
def get_data(filename):
#ISO-8859-1
df=pd.read_csv(filename,thousands=",",sep=",",encoding='ISO-8859-1')
df["Time"]=pd.to_datetime(df["Time"])
df["Value"]=pd.to_numeric(df["Value"],errors='coerce')
df=df.set_index(df["Time"])
df=df[df["Time"]>="2015-05-01"]
df=df[df["Time"]<"2016-05-01"]
df=df[["Value"]]
df["Value"]=pd.to_numeric(df["Value"],errors='coerce')
# Set a conditional interpolate in the future based on dataset size and
#null values detected
df=df.interpolate()
return df
我不确定错误是否是由编码格式引起的(我也尝试了默认的'utf-8')或者我的csv文件的格式化方式。任何人都可以指出错误或替代方法。谢谢!