熊猫在多指数内的总行数百分比

时间:2018-02-01 16:20:45

标签: python pandas

我的数据框如下所示:

df = pd.DataFrame([['Foo','A','Green',10,20],['Foo','A','Red',20,30],['Foo','A','Total',50,60],['Foo','B','Blue',5,10],['Foo','B','Red',15,25],['Foo','B','Total',40,100],['Foo','C','Orange',25,8],['Foo','C','Total',50,10]],columns = ['Default','Letter','Color','Value'])
print df

      Default Letter   Color  Value  Value2
0     Foo      A   Green     10      20
1     Foo      A     Red     20      30
2     Foo      A   Total     50      60
3     Foo      B    Blue      5      10
4     Foo      B     Red     15      25
5     Foo      B   Total     40     100
6     Foo      C  Orange     25       8
7     Foo      C   Total     50      10

我需要找到每种颜色在每个组中所占总行数的百分比

我的第一个想法是将它们分成单独的索引,并使用.div,但在这种情况下我有一个多索引(我知道在我的例子中,第一个都说Foo,但这不是真实数据看起来如何 - 滚动它。)我得到了notImplemented Error。

df_color = df[df['Color']!='Total'].set_index(['Default','Letter','Color'])
df_tot = df[df['Color']=='Total'].drop(['Color'],axis = 1).set_index(['Default','Letter'])

df_out = df_color.div(df_tot)

NotImplementedError                       Traceback (most recent call last)
<ipython-input-119-0caf0e2959a6> in <module>()
      4 df_tot = df[df['Color']=='Total'].drop(['Color'],axis = 1).set_index(['Default','Letter'])
      5 
----> 6 df_out = df_color.div(df_tot)
      7 #df.set_index(['Default','Letter','Color'],inplace = True)...

这是我想要的输出:

df_out = pd.DataFrame([['Foo','A','Green',.2,.333],['Foo','A','Red',.4,.5],['Foo','B','Blue',.125,.1],['Foo','B','Red',.375,.25],['Foo','C','Orange',.5,.8]],columns = ['Default','Letter','Color','Value','Value2'])
​
print df_out
df_out = pd.DataFrame([['Foo','A','Green',.2,.333],['Foo','A','Red',.4,.5],['Foo','B','Blue',.125,.1],['Foo','B','Red',.375,.25],['Foo','C','Orange',.5,.8]],columns = ['Default','Letter','Color','Value','Value2'])

编辑请注意,实际上有多个值列 - 为简单起见,我只在此处显示一个,但解决方案需要处理50-100个数值列。

2 个答案:

答案 0 :(得分:0)

您可以使用groupby执行此操作。使用groupby结帐the tutorial

注意:此实现假设每种颜色的services.AddAuthentication(CookieAuthenticationDefaults.AuthenticationScheme) .AddCookie(o => { o.AccessDeniedPath = new PathString("/Error/AccessDenied"); o.LoginPath = new PathString("/Account/Login/"); o.Cookie.Path = "/"; o.Cookie.SecurePolicy = CookieSecurePolicy.SameAsRequest; o.Cookie.HttpOnly = true; o.LogoutPath = new PathString("/Account/Logout/"); o.Events.OnRedirectToLogin = (context) => { var routeData = context.HttpContext.GetRouteData(); RouteValueDictionary routeValues = new RouteValueDictionary(); if (routeData != null) routeValues.Add("lang", routeData.Values["lang"]); Uri uri = new Uri(context.RedirectUri); string returnUrl = HttpUtility.ParseQueryString(uri.Query)[context.Options.ReturnUrlParameter]; string focustab = ""; context.Request.Query.ToList().ForEach(x => { if (x.Key == "id") routeValues.Add("id", x.Value.FirstOrDefault()); if (x.Key == "values") routeValues.Add("values", x.Value.FirstOrDefault()); }); routeValues.Add(context.Options.ReturnUrlParameter, returnUrl + focustab); //context here is a redirect context, how can i get the action context to create a new Action as what UrlHelper is expecting context.RedirectUri = new UrlHelper(context).Action("login", "account", routeValues); return Task.CompletedTask; }; }); 条目是该颜色的最后一个(如示例中所示),但这很容易修改。

Total

返回

cols = [x for x in df.columns if x not  in ['Default', 'Letter', 'Color']]  # or df.columns[3:]
df.loc[:, cols] = df.groupby('Letter', group_keys=False).apply(lambda df: df[cols] / df[cols].iloc[-1])
df[~(df['Color'] == 'Total')]

答案 1 :(得分:0)

我最后使用融合函数重新格式化数据名,因此列名成为数据中的另一列。然后我可以简单地合并和分割,并在末尾重新格式化

df = pd.DataFrame([['Foo','A','Green',10,20],['Foo','A','Red',20,30],['Foo','A','Total',50,60],['Foo','B','Blue',5,10],['Foo','B','Red',15,25],['Foo','B','Total',40,100],['Foo','C','Orange',25,8],['Foo','C','Total',50,10]],columns = ['Default','Letter','Color','Value','Value2'])

df_color = df[df['Color']!='Total']
df_tot = df[df['Color']=='Total'].drop(['Color'],axis = 1)

df_melt = pd.melt(df_color,id_vars = ['Default','Letter', 'Color'],var_name =['value_field'] )
df_tot_melt = pd.melt(df_tot,id_vars = ['Default','Letter'],var_name =['value_field'], value_name = 'Total')


df_melt_pct = pd.merge(df_melt, df_tot_melt, how = 'outer', on = ['Default','Letter','value_field'])
df_melt_pct['Pct'] = df_melt_pct['value'] /df_melt_pct['Total']
df_melt_pct = df_melt_pct.drop(['value','Total'],axis = 1).set_index(['Default','Letter','Color','value_field']).unstack()
df_melt_pct.columns = df_melt_pct.columns.droplevel(0)

print df_melt_pct

value_field            Value    Value2
Default Letter Color                  
Foo     A      Green   0.200  0.333333
               Red     0.400  0.500000
        B      Blue    0.125  0.100000
               Red     0.375  0.250000
        C      Orange  0.500  0.800000