如何对数据进行分组并绘制线图

时间:2017-06-27 07:05:29

标签: python pandas matplotlib ipython-notebook data-science

这是我第一次使用pandas和iPython笔记本,但无法找出问题的正确搜索条件。

我有一个.xls文件,用于3个网站ABC的3个构建服务器的编译时数据。这些构建服务器编译多个项目,因此我将选择任何特定项目。因此,我需要绘制这样的数据(对于特定项目 - 并非所有在一个图表中,以保持简单):

X-axis = date
Y-axis = average build time on that date

3 lines for sites A, B and C

到目前为止我做了什么:

import pandas as pd
import numpy as np
import matplotlib as plt 

file=  r'/home/abc/Downloads/request.xls'
df = pd.read_excel(file,parse_dates=['Date'])

build_times = df[['Date','site','project','Duration']]
build_group = build_times.groupby(['Date','site','project']).mean()

我需要以下方面的帮助:

  1. 我如何只选择成功的构建 如果列status为0和1。

  2. 如何使用上述X轴和Y轴绘制网站ABC(针对特定项目)的线条。

  3. 修改

    在@jezrael回答之后,我能够得到以下数据

    2017-03-27  A   project1    963.200000
                B   project2    4587.176471
                C   project2    1449.375000
                C   project1    1449.375000
      .......
    2017-03-28  A   project1    93.200000
                B   project1    4787.176471
                C   project2    1339.375000
                C   project1    1749.375000
    

2 个答案:

答案 0 :(得分:2)

我认为您需要先按boolean indexingquery进行过滤:

build_group = build_times[build_times['status'] == 1]
                          .groupby(['Date','site','project'])['Duration'].mean()

或者:

build_group = build_times.query('status == 1')
                         .groupby(['Date','site','project'])['Duration'].mean()

输出与:

相同
d={'Duration': [963.2, 4587.176471, 1449.375, 1449.375, 93.2, 4787.176471, 1339.375, 1749.375], 
'project': ['project1', 'project2', 'project2', 'project1', 'project1', 'project1', 'project2', 'project1'], 
'Date': [pd.Timestamp('2017-03-27 00:00:00'), pd.Timestamp('2017-03-27 00:00:00'), pd.Timestamp('2017-03-27 00:00:00'), pd.Timestamp('2017-03-27 00:00:00'), pd.Timestamp('2017-03-28 00:00:00'), pd.Timestamp('2017-03-28 00:00:00'), pd.Timestamp('2017-03-28 00:00:00'), pd.Timestamp('2017-03-28 00:00:00')], 
'site': ['A', 'B', 'C', 'C', 'A', 'B', 'C', 'C']}
build_group = pd.DataFrame(d).set_index(['Date','site','project'])['Duration']
print (build_group)
Date        site  project 
2017-03-27  A     project1     963.200000
            B     project2    4587.176471
            C     project2    1449.375000
                  project1    1449.375000
2017-03-28  A     project1      93.200000
            B     project1    4787.176471
            C     project2    1339.375000
                  project1    1749.375000
Name: Duration, dtype: float64

然后使用level=1 unstack重新整形(因为level==1sites)并按xs选择。最后plot

#for check column names for typos
print (build_group.index.get_level_values(2).unique().tolist())
['project1', 'project2']

p = 'project1'
build_group = build_group.unstack(level=1).xs(p, level=1, axis=0)
print (build_group)
site            A            B         C
Date                                    
2017-03-27  963.2          NaN  1449.375
2017-03-28   93.2  4787.176471  1749.375

build_group.plot()

graph

答案 1 :(得分:0)

关键字是: DataFrame.mean(axis = None,skipna = None,level = None,numeric_only = None,** kwargs)[source]

参考资料在此链接中提供https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.mean.html

然后你可以这样做: success = df ['成功']> 0 这将创建一个新的数据帧成功。哪里成功'是你的列有1还是0.

对于(2),您可以这样做,只选择列并使用df.plot绘制它(* args)