我有一个这样的数据框:
const routes: Routes = [
{
path: "",
component: BaseComponent,
canActivate: [AuthGuardService],
children: [
{path: "", redirectTo: "/boards", pathMatch: "full"},
{path: "boards", component: BoardsComponent},
{
path: "admin",
component: AdminComponent,
canActivate: [AdminGuardService]
},
{
path: "admin/users",
component: AdminUsersComponent,
canActivate: [AdminGuardService]
}
]
},
{
path: "login",
component: LoginComponent
}
];
我想将customer_id | date | category
1 | 2017-2-1 | toys
2 | 2017-2-1 | food
1 | 2017-2-1 | drinks
3 | 2017-2-2 | computer
2 | 2017-2-1 | toys
1 | 2017-3-1 | food
列的值设为新列,并对其中的列进行热编码,我知道我可以使用category
,我也想按df.pivot_table(index = ['customer_id'], columns = ['category'])
分组,因此每一行仅包含来自同一日期的信息,例如在下面的所需输出中,id 1有两行,因为date
列中有两个唯一的日期。
date
答案 0 :(得分:2)
您可能正在寻找crosstab
pd.crosstab([df.customer_id,df.date],df.category).reset_index(level=1,drop=True)
Out[102]:
category computer drinks food toys
customer_id
1 0 1 0 1
1 0 0 1 0
2 0 0 1 1
3 1 0 0 0
答案 1 :(得分:0)
假设您的框架称为df
,则可以添加一个指标列,然后直接使用.pivot_table
:
df['Indicator'] = 1
pvt = df.pivot_table(index=['date', 'customer_id'],
columns='category',
values='Indicator')\
.fillna(0)
这将提供一个数据框,如下所示:
category computer drinks food toys
date customer_id
2017-2-1 1 0.0 1.0 0.0 1.0
2 0.0 0.0 1.0 1.0
2017-2-2 3 1.0 0.0 0.0 0.0
2017-3-1 1 0.0 0.0 1.0 0.0