如何在python中绘制程序的持续时间

时间:2017-03-08 10:18:02

标签: python csv pandas matplotlib dataframe

我试图绘制一些在夜间运行的程序的持续时间,我将程序持续时间数据导出到CSV文件中,以便以后可以进行分析。 (像这样)

example

以下是我的代码和CSV示例:

CSV:

 na,programName,totaal,na,startDate,endDate,Date
 ?,"to/check.apl",54006,?,2017-02-27T20:04:07.233,2017-02- 27T20:05:01.239,2017-02-27T00:00:00.000
 ?,"to/ibx.apl",143887,?,2017-02-27T20:07:55.627,2017-02-27T20:10:19.514,2017-02-27T00:00:00.000
 ?,"to/checker.apl",2039600,?,2017-02-27T20:14:37.662,2017-02-27T20:48:37.262,2017-02-27T00:00:00.000

python代码:

 import matplotlib
 from pandas import *
 import pandas as pd
 import numpy  as np
 import matplotlib.pyplot as plt

 matplotlib.style.use('ggplot')

 data = "miFile.csv"
 df = pd.DataFrame.from_csv(data)
 df = df.set_index('totaal')

 newDf = df[['programName','startDate','endDate']]

到目前为止,我收到了日期时间错误,所以我试图通过这样做来解决这个问题(也没有好运的情节):

 newDf['startDate'] = pd.to_datetime(newDf['startDate'])
 newDf['endDate'] = pd.to_datetime(newDf['endDate'])

 #pd.to_datetime(pd.Series(["2017-02-27T20:04:07.233"]) format= "%d, %m, %y, %H: %M: %S")

 newDf.plot('programName','startDate','endDate')

 plt.show()

3 个答案:

答案 0 :(得分:2)

我认为您需要read_csv来创建df,然后为convert timedelta获取不同列和plotminutes

temp=u"""na,programName,totaal,na,startDate,endDate,Date
?,"to/check.apl",54006,?,2017-02-27T20:04:07.233,2017-02-27T20:05:01.239,2017-02-27T00:00:00.000
?,"to/ibx.apl",143887,?,2017-02-27T20:07:55.627,2017-02-27T20:10:19.514,2017-02-27T00:00:00.000
?,"to/checker.apl",2039600,?,2017-02-27T20:14:37.662,2017-02-27T20:48:37.262,2017-02-27T00:00:00.000"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), index_col=[2], parse_dates=[4,5,6])

print (df.dtypes)
na                     object
programName            object
na.1                   object
startDate      datetime64[ns]
endDate        datetime64[ns]
Date           datetime64[ns]
dtype: object
df['duration'] = (df['endDate'] - df['startDate']).astype('timedelta64[m]')
newDf = df[['programName','duration']]
print (newDf)
            programName  duration
totaal                           
54006      to/check.apl       0.0
143887       to/ibx.apl       2.0
2039600  to/checker.apl      33.0

newDf.plot()

plt.show()

答案 1 :(得分:0)

我建议您使用pandas.read_csv()而不是pandas.DataFrame.from_csv()。 然后我会考虑把时间与小时分开的T.

答案 2 :(得分:0)

感谢jezreal这是我的最终解决方案的样子,并且工作正常。我会在几秒钟内完成绘图,因为1分钟以内的程序将被忽略,这在我的情况下是不准确的。

import matplotlib
from pandas import *
import pandas as pd
import matplotlib.pyplot as plt

matplotlib.style.use('ggplot')

data = "miFile.csv"
df = pd.read_csv(data,index_col=[2], parse_dates=[4,5,6])

df['duration'] = (df['endDate'] - df['startDate']).astype('timedelta64[s]')
newDf = df[['programName','duration']]

newDf.plot('programName','duration')
plt.show()