我试图绘制一些在夜间运行的程序的持续时间,我将程序持续时间数据导出到CSV文件中,以便以后可以进行分析。 (像这样)
以下是我的代码和CSV示例:
CSV:
na,programName,totaal,na,startDate,endDate,Date
?,"to/check.apl",54006,?,2017-02-27T20:04:07.233,2017-02- 27T20:05:01.239,2017-02-27T00:00:00.000
?,"to/ibx.apl",143887,?,2017-02-27T20:07:55.627,2017-02-27T20:10:19.514,2017-02-27T00:00:00.000
?,"to/checker.apl",2039600,?,2017-02-27T20:14:37.662,2017-02-27T20:48:37.262,2017-02-27T00:00:00.000
python代码:
import matplotlib
from pandas import *
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
matplotlib.style.use('ggplot')
data = "miFile.csv"
df = pd.DataFrame.from_csv(data)
df = df.set_index('totaal')
newDf = df[['programName','startDate','endDate']]
到目前为止,我收到了日期时间错误,所以我试图通过这样做来解决这个问题(也没有好运的情节):
newDf['startDate'] = pd.to_datetime(newDf['startDate'])
newDf['endDate'] = pd.to_datetime(newDf['endDate'])
#pd.to_datetime(pd.Series(["2017-02-27T20:04:07.233"]) format= "%d, %m, %y, %H: %M: %S")
newDf.plot('programName','startDate','endDate')
plt.show()
答案 0 :(得分:2)
我认为您需要read_csv
来创建df
,然后为convert timedelta获取不同列和plot
到minutes
:
temp=u"""na,programName,totaal,na,startDate,endDate,Date
?,"to/check.apl",54006,?,2017-02-27T20:04:07.233,2017-02-27T20:05:01.239,2017-02-27T00:00:00.000
?,"to/ibx.apl",143887,?,2017-02-27T20:07:55.627,2017-02-27T20:10:19.514,2017-02-27T00:00:00.000
?,"to/checker.apl",2039600,?,2017-02-27T20:14:37.662,2017-02-27T20:48:37.262,2017-02-27T00:00:00.000"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), index_col=[2], parse_dates=[4,5,6])
print (df.dtypes)
na object
programName object
na.1 object
startDate datetime64[ns]
endDate datetime64[ns]
Date datetime64[ns]
dtype: object
df['duration'] = (df['endDate'] - df['startDate']).astype('timedelta64[m]')
newDf = df[['programName','duration']]
print (newDf)
programName duration
totaal
54006 to/check.apl 0.0
143887 to/ibx.apl 2.0
2039600 to/checker.apl 33.0
newDf.plot()
plt.show()
答案 1 :(得分:0)
我建议您使用pandas.read_csv()而不是pandas.DataFrame.from_csv()。 然后我会考虑把时间与小时分开的T.
答案 2 :(得分:0)
感谢jezreal这是我的最终解决方案的样子,并且工作正常。我会在几秒钟内完成绘图,因为1分钟以内的程序将被忽略,这在我的情况下是不准确的。
import matplotlib
from pandas import *
import pandas as pd
import matplotlib.pyplot as plt
matplotlib.style.use('ggplot')
data = "miFile.csv"
df = pd.read_csv(data,index_col=[2], parse_dates=[4,5,6])
df['duration'] = (df['endDate'] - df['startDate']).astype('timedelta64[s]')
newDf = df[['programName','duration']]
newDf.plot('programName','duration')
plt.show()