我的数据框如下所示:
df = pd.DataFrame([['Foo','A','Green',10,20],['Foo','A','Red',20,30],['Foo','A','Total',50,60],['Foo','B','Blue',5,10],['Foo','B','Red',15,25],['Foo','B','Total',40,100],['Foo','C','Orange',25,8],['Foo','C','Total',50,10]],columns = ['Default','Letter','Color','Value'])
print df
Default Letter Color Value Value2
0 Foo A Green 10 20
1 Foo A Red 20 30
2 Foo A Total 50 60
3 Foo B Blue 5 10
4 Foo B Red 15 25
5 Foo B Total 40 100
6 Foo C Orange 25 8
7 Foo C Total 50 10
我需要找到每种颜色在每个组中所占总行数的百分比
我的第一个想法是将它们分成单独的索引,并使用.div,但在这种情况下我有一个多索引(我知道在我的例子中,第一个都说Foo,但这不是真实数据看起来如何 - 滚动它。)我得到了notImplemented Error。
df_color = df[df['Color']!='Total'].set_index(['Default','Letter','Color'])
df_tot = df[df['Color']=='Total'].drop(['Color'],axis = 1).set_index(['Default','Letter'])
df_out = df_color.div(df_tot)
NotImplementedError Traceback (most recent call last)
<ipython-input-119-0caf0e2959a6> in <module>()
4 df_tot = df[df['Color']=='Total'].drop(['Color'],axis = 1).set_index(['Default','Letter'])
5
----> 6 df_out = df_color.div(df_tot)
7 #df.set_index(['Default','Letter','Color'],inplace = True)...
这是我想要的输出:
df_out = pd.DataFrame([['Foo','A','Green',.2,.333],['Foo','A','Red',.4,.5],['Foo','B','Blue',.125,.1],['Foo','B','Red',.375,.25],['Foo','C','Orange',.5,.8]],columns = ['Default','Letter','Color','Value','Value2'])
print df_out
df_out = pd.DataFrame([['Foo','A','Green',.2,.333],['Foo','A','Red',.4,.5],['Foo','B','Blue',.125,.1],['Foo','B','Red',.375,.25],['Foo','C','Orange',.5,.8]],columns = ['Default','Letter','Color','Value','Value2'])
编辑请注意,实际上有多个值列 - 为简单起见,我只在此处显示一个,但解决方案需要处理50-100个数值列。
答案 0 :(得分:0)
您可以使用groupby
执行此操作。使用groupby结帐the tutorial。
注意:此实现假设每种颜色的services.AddAuthentication(CookieAuthenticationDefaults.AuthenticationScheme)
.AddCookie(o =>
{
o.AccessDeniedPath = new PathString("/Error/AccessDenied");
o.LoginPath = new PathString("/Account/Login/");
o.Cookie.Path = "/";
o.Cookie.SecurePolicy = CookieSecurePolicy.SameAsRequest;
o.Cookie.HttpOnly = true;
o.LogoutPath = new PathString("/Account/Logout/");
o.Events.OnRedirectToLogin = (context) =>
{
var routeData = context.HttpContext.GetRouteData();
RouteValueDictionary routeValues = new RouteValueDictionary();
if (routeData != null) routeValues.Add("lang", routeData.Values["lang"]);
Uri uri = new Uri(context.RedirectUri);
string returnUrl = HttpUtility.ParseQueryString(uri.Query)[context.Options.ReturnUrlParameter];
string focustab = "";
context.Request.Query.ToList().ForEach(x =>
{
if (x.Key == "id") routeValues.Add("id", x.Value.FirstOrDefault());
if (x.Key == "values") routeValues.Add("values", x.Value.FirstOrDefault());
});
routeValues.Add(context.Options.ReturnUrlParameter, returnUrl + focustab);
//context here is a redirect context, how can i get the action context to create a new Action as what UrlHelper is expecting
context.RedirectUri = new UrlHelper(context).Action("login", "account", routeValues);
return Task.CompletedTask;
};
});
条目是该颜色的最后一个(如示例中所示),但这很容易修改。
Total
返回
cols = [x for x in df.columns if x not in ['Default', 'Letter', 'Color']] # or df.columns[3:]
df.loc[:, cols] = df.groupby('Letter', group_keys=False).apply(lambda df: df[cols] / df[cols].iloc[-1])
df[~(df['Color'] == 'Total')]
答案 1 :(得分:0)
我最后使用融合函数重新格式化数据名,因此列名成为数据中的另一列。然后我可以简单地合并和分割,并在末尾重新格式化
df = pd.DataFrame([['Foo','A','Green',10,20],['Foo','A','Red',20,30],['Foo','A','Total',50,60],['Foo','B','Blue',5,10],['Foo','B','Red',15,25],['Foo','B','Total',40,100],['Foo','C','Orange',25,8],['Foo','C','Total',50,10]],columns = ['Default','Letter','Color','Value','Value2'])
df_color = df[df['Color']!='Total']
df_tot = df[df['Color']=='Total'].drop(['Color'],axis = 1)
df_melt = pd.melt(df_color,id_vars = ['Default','Letter', 'Color'],var_name =['value_field'] )
df_tot_melt = pd.melt(df_tot,id_vars = ['Default','Letter'],var_name =['value_field'], value_name = 'Total')
df_melt_pct = pd.merge(df_melt, df_tot_melt, how = 'outer', on = ['Default','Letter','value_field'])
df_melt_pct['Pct'] = df_melt_pct['value'] /df_melt_pct['Total']
df_melt_pct = df_melt_pct.drop(['value','Total'],axis = 1).set_index(['Default','Letter','Color','value_field']).unstack()
df_melt_pct.columns = df_melt_pct.columns.droplevel(0)
print df_melt_pct
value_field Value Value2
Default Letter Color
Foo A Green 0.200 0.333333
Red 0.400 0.500000
B Blue 0.125 0.100000
Red 0.375 0.250000
C Orange 0.500 0.800000