给定每个组中具有不同数量元素的数据框(由某个变量“groupby”),我需要重塑为具有预定义列数的矩阵。例如:
summary_x participant_id_x response_date cuts
0 3.0 11 2016-05-05 a
1 3.0 11 2016-05-06 a
2 4.0 11 2016-05-07 a
3 4.0 11 2016-05-08 a
4 3.0 11 2016-05-09 a
5 3.0 11 2016-05-10 a
6 3.0 11 2016-05-11 a
7 3.0 11 2016-05-12 a
8 3.0 11 2016-05-13 a
9 3.0 11 2016-05-14 a
13 4.0 11 2016-05-22 b
14 4.0 11 2016-05-23 b
15 3.0 11 2016-05-24 b
16 3.0 11 2016-05-25 b
17 3.0 11 2016-05-26 b
18 3.0 11 2016-05-27 b
19 3.0 11 2016-05-28 b
20 3.0 11 2016-06-02 c
21 3.0 11 2016-06-03 c
22 3.0 11 2016-06-04 c
23 3.0 11 2016-06-05 c
24 3.0 11 2016-06-06 c
25 3.0 11 2016-06-07 c
26 3.0 11 2016-06-08 c
27 3.0 11 2016-06-09 c
28 3.0 11 2016-06-10 c
29 5.0 11 2016-06-11 c
每个组(by'cuts')包含10个元素,但组'b'仅包含7.我希望将'summary_x'中的矩阵重新整形为(3,10),其中缺少的值将为用nans填写:
pd.DataFrame(df.summary_x.values.reshape((-1,10)))
0 1 2 3 4 5 6 7 8 9
0 3.0 3.0 4.0 4.0 3.0 3.0 3.0 3.0 3.0 3.0
1 nan nan nan 4.0 4.0 3.0 3.0 3.0 3.0 3.0
2 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 5.0
任何解决方案?
答案 0 :(得分:1)
您可以将cumcount
与[::-1]
一起用于列(行)的更改顺序:
g = df.groupby('cuts').cumcount(ascending=False)
df = pd.pivot(index=df['cuts'], columns=g, values=df['summary_x']).iloc[:,::-1]
.reset_index(drop=True)
df.columns = np.arange(len(df.columns))
print (df)
0 1 2 3 4 5 6 7 8 9
0 3.0 3.0 4.0 4.0 3.0 3.0 3.0 3.0 3.0 3.0
1 NaN NaN NaN 4.0 4.0 3.0 3.0 3.0 3.0 3.0
2 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 5.0
另一种解决方案:
L = df[::-1].groupby('cuts')['summary_x'].apply(list).values.tolist()
df = pd.DataFrame(L).iloc[:, ::-1]
df.columns = np.arange(len(df.columns))
print (df)
0 1 2 3 4 5 6 7 8 9
0 3.0 3.0 4.0 4.0 3.0 3.0 3.0 3.0 3.0 3.0
1 NaN NaN NaN 4.0 4.0 3.0 3.0 3.0 3.0 3.0
2 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 5.0
但如果NaN
s最终也可以:
g = df.groupby('cuts').cumcount()
df = pd.pivot(index=df['cuts'], columns=g, values=df['summary_x']).reset_index(drop=True)
print (df)
0 1 2 3 4 5 6 7 8 9
0 3.0 3.0 4.0 4.0 3.0 3.0 3.0 3.0 3.0 3.0
1 4.0 4.0 3.0 3.0 3.0 3.0 3.0 NaN NaN NaN
2 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 5.0