使用Python Pandas进行数据分析

时间:2016-10-28 11:19:24

标签: python pandas

我是Pandas图书馆的新手,需要一些帮助。我有两个这样的列:

Test Result       Risk Rating
  Fail               Low                   
  Pass               Medium
  Skip               High
  Pass               Low                   
  Fail               Medium
  Pass               High
  Skip               Low                   
  Fail               Medium
  Fail               High

现在,我需要从"测试结果"中找到失败,通过和跳过的总数。专栏,我能够做到。但是,我还需要获得"失败"的总数。从“测试结果”列中选择"低"来自风险评级专栏。同样,Fail with Medium等总数。我的最终结果应如下:

Fail (Low Risk Rating) = 1
Fail (Medium Risk Rating) = 2
Fail (High Risk Rating) = 1
Pass (Low Risk Rating) = 1
Pass (Medium Risk Rating) = 1
Pass (High Risk Rating) = 1
Skip (Low Risk Rating) = 1
Skip (Medium Risk Rating) = 0
Skip (High Risk Rating) = 1

我该怎么做?任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:3)

我认为两列都需要groupby并汇总size

public static class Current
{
    public static string Host
    {
        get { return "httpContextAccessor here"; }
    }
}

使用unstack的透视表可能更好:

df = df.groupby(['Test Result', 'Risk Rating']).size().reset_index(name='counts')
print (df)
  Test Result Risk Rating  counts
0        Fail        High       1
1        Fail         Low       1
2        Fail      Medium       2
3        Pass        High       1
4        Pass         Low       1
5        Pass      Medium       1
6        Skip        High       1
7        Skip         Low       1

使用crosstab的更慢的解决方案:

df = df.groupby(['Test Result', 'Risk Rating']).size().unstack(fill_value=0)
print (df)
Risk Rating  High  Low  Medium
Test Result                   
Fail            1    1       2
Pass            1    1       1
Skip            1    1       0

如果df = pd.crosstab(df['Test Result'], df['Risk Rating']) print (df) Risk Rating High Low Medium Test Result Fail 1 1 2 Pass 1 1 1 Skip 1 1 0 添加stack需要缺少值:

0