Question

我正在尝试更改数据框内的结构数据

year  month  count  reason 
2001  1     1       a
2001  2     3       b
2001  3     4       c
2005  1     4       a
2005  1     3       c

在新数据框中的外观应为：

year  month  count  reason_a  reason_b  reason_c  
2001  1      1      1         0         0
2001  2      3      0         3         0
2001  3      4      0         0         4
2005  1      7      4         0         3

有人可以显示一些Python代码吗？预先谢谢你，

Answer 1

使用

DataFrame.join()-连接另一个DataFrame的列。
pandas.get_dummies()-将类别变量转换为虚拟变量/指标变量。
DataFrame.mul()-获取数据帧和其他逐元素的乘法（二进制运算符mul）。
DataFrame.groupby()-使用映射器或一系列列对DataFrame或Series进行分组。
DataFrameGroupBy.agg()-使用可调用项，字符串，字典或字符串/可调用项列表进行汇总。

例如。

dummies = df.join(pd.get_dummies(df["reason"],prefix='reason').mul(df['count'], axis=0))
f = {'count': 'sum', 'reason_a': 'first', 'reason_b': 'first', 'reason_c': 'last'}
df1 = dummies.groupby(['year','month'],sort=False,as_index=False).agg(f)
print(df1)

O / P：

   year  month  count  reason_a  reason_b  reason_c
0  2001      1      1         1         0         0
1  2001      2      3         0         3         0
2  2001      3      4         0         0         4
3  2005      1      7         4         0         3

Answer 2

使用pivot_table：

df2 = pd.pivot_table(df,index=["year","month",],values=["count"],columns="reason").reset_index().fillna(0)
df2.columns = [i[0] if i[0]!="count" else f"reason_{i[1]}" for i in df2.columns]
df2["count"] = df2.iloc[:,2:5].sum(axis=1)
print (df2)
#
   year  month  reason_a  reason_b  reason_c  count
0  2001      1       1.0       0.0       0.0    1.0
1  2001      2       0.0       3.0       0.0    3.0
2  2001      3       0.0       0.0       4.0    4.0
3  2005      1       4.0       0.0       3.0    7.0

未定义的行到列，按年和月分组

2 个答案: