我正在尝试生成特别结构化的数据框,但是我似乎无法“堆叠”数据。我的原始数据样本:
# raw data
df = pd.DataFrame({'Name':['name1', 'name2', 'name3', 'name1', 'name2', 'name3', 'name1', 'name2', 'name3' ],
'Year':['freshman','sophomore','freshman', 'freshman','sophomore','freshman', 'freshman','sophomore','freshman'],
'Rotation':['ERJD','PEDI','MAM','PEDI', 'ERJD','PEDI','MAM','ERJD','ABD'],
'Week1':[1,1,1,0,0,0,0,0,0],
'Week2':[0,0,0,1,1,1,0,0,0],
'Week3':[0,0,0,0,0,0,1,1,1],
'Week4':[1,0,0,0,0,0,0,1,1]
})
df = df[['Name','Year','Rotation','Week1','Week2','Week3','Week4']]
外观如下:
Name Year Rotation Week1 Week2 Week3 Week4
0 name1 freshman ERJD 1 0 0 1
1 name2 sophomore PEDI 1 0 0 0
2 name3 freshman MAM 1 0 0 0
3 name1 freshman PEDI 0 1 0 0
4 name2 sophomore ERJD 0 1 0 0
5 name3 freshman PEDI 0 1 0 0
6 name1 freshman MAM 0 0 1 0
7 name2 sophomore ERJD 0 0 1 1
8 name3 freshman ABD 0 0 1 1
我重塑了数据框:
#Reshape Table + Filtering
df = pd.melt(df,
id_vars=['Name','Year','Rotation'],
value_vars=list(df.columns[3:]),
var_name='Week',
value_name='Sum of Value')
df = df.loc[df['Sum of Value'] == 1].reset_index()
df.pop('index')
哪个生成:
Name Year Rotation Week Sum of Value
0 name1 freshman ERJD Week1 1
1 name2 sophomore PEDI Week1 1
2 name3 freshman MAM Week1 1
3 name1 freshman PEDI Week2 1
4 name2 sophomore ERJD Week2 1
5 name3 freshman PEDI Week2 1
6 name1 freshman MAM Week3 1
7 name2 sophomore ERJD Week3 1
8 name3 freshman ABD Week3 1
9 name1 freshman ERJD Week4 1
10 name2 sophomore ERJD Week4 1
11 name3 freshman ABD Week4 1
我创建一个数据透视表:
#Create Pivot
pivot = df.pivot_table(index=['Rotation','Year'], columns='Week', values='Name', aggfunc=lambda x: ' '.join(x))
pivot = pivot.reindex(weeks, axis=1) # Change order of Columns
pivot
哪个生成:
Week1 Week2 Week3 Week4
Rotation Year
ABD freshman None None name3 name3
ERJD freshman name1 None None name1
sophomore None name2 name2 name2
MAM freshman name3 None name1 None
PEDI freshman None name1 name3 None None
sophomore name2 None None None
我想将表中的名称堆叠在一起,例如 Week2 PEDI有 name1
和name3
并排放置。如何将名称放在不同的行上?有没有比使用数据透视表更好的方法了? pd.melt
步骤是否甚至必要?
所需结构:
Week1 Week2 Week3 Week4
Rotation Year
ABD freshman None None name3 name3
ERJD freshman name1 None None name1
sophomore None name2 name2 name2
MAM freshman name3 None name1 None
PEDI freshman None name1 None None
name3
sophomore name2 None None None
预先感谢您的帮助!
解决方案:
在pd.melt
之后,执行以下操作:
df['aggval'] = df['Week'].map(str) + df['Rotation']
df['aggval'] = df.groupby(['aggval']).cumcount()+1
pivot = df.pivot_table(index=['Rotation','aggval'], columns='Week', values='Name', aggfunc=lambda x: ' '.join(x)).fillna('')
pivot = pivot.reindex(weeks, axis=1)
答案 0 :(得分:0)
您可以遍历感兴趣的几周,并有条件地填充数据框,如下所示:
for week in ['Week1','Week2','Week3','Week4']:
df[week] = np.where(df[week]==1, df['Name'], df[week])
这给出了:
Name Year Rotation Week1 Week2 Week3 Week4
0 name1 freshman ERJD name1 0 0 name1
1 name2 sophmore PEDI name2 0 0 0
2 name3 freshman MAM name3 0 0 0
3 name1 freshman PEDI 0 name1 0 0
4 name2 sophmore ERJD 0 name2 0 0
5 name3 freshman PEDI 0 name3 0 0
6 name1 freshman MAM 0 0 name1 0
7 name2 sophmore ERJD 0 0 name2 name2
8 name3 freshman ABD 0 0 name3 name3
然后,您可以对数据框进行分组,并将字符串类型的条目存储在列表中:
grouped = df.drop('Name', axis=1).groupby(['Rotation','Year']).agg(lambda x: [i for i in x if type(i)==str])
哪个给:
Week1 Week2 Week3 Week4
Rotation Year
ABD freshman [] [] [name3] [name3]
ERJD freshman [name1] [] [] [name1]
sophmore [] [name2] [name2] [name2]
MAM freshman [name3] [] [name1] []
PEDI freshman [] [name1, name3] [] []
sophmore [name2] [] [] []
请注意,OP的所需输出中有错误。没有('MAM','sophmore')
组。另外请注意,为清楚起见,'sophmore'
的拼写为'sophomore'
。
答案 1 :(得分:0)
您可以使用set_index
和mul
进行此操作:
df1 = df.set_index(['Rotation','Year'])
df1.filter(like='Week').mul(df1['Name'], axis=0)\
.replace('',np.nan)\
.sort_index()
输出:
Week1 Week2 Week3 Week4
Rotation Year
ABD freshman NaN NaN name3 name3
ERJD freshman name1 NaN NaN name1
sophomore NaN name2 NaN NaN
sophomore NaN NaN name2 name2
MAM freshman name3 NaN NaN NaN
freshman NaN NaN name1 NaN
PEDI freshman NaN name1 NaN NaN
freshman NaN name3 NaN NaN
sophomore name2 NaN NaN NaN
答案 2 :(得分:0)
在pd.melt之后,请执行以下操作:
df['aggval'] = df['Week'].map(str) + df['Rotation']
df['aggval'] = df.groupby(['aggval']).cumcount()+1
pivot = df.pivot_table(index=['Rotation','aggval'], columns='Week', values='Name', aggfunc=lambda x: ' '.join(x)).fillna('')
pivot = pivot.reindex(weeks, axis=1)