假设我有重复测量研究的数据集,如下所示:
control dose_high dose_low gender participant
0 4 6 4 m 1
1 3 5 5 f 2
2 2 8 6 m 3
为了分析这些数据,我想将其转换为堆叠格式,将性别保持为协变量:
stacked = df[['dose_high', 'dose_low', 'control']].stack()
df2 = stacked.reset_index()
print df.merge(df2, how='outer', left_index=True, right_index=False, right_on="level_0")[['gender', 'participant', 'level_1', 0]]
这会产生正确的结果:
gender participant level_1 0
0 m 1 dose_high 6
1 m 1 dose_low 4
2 m 1 control 4
3 f 2 dose_high 5
4 f 2 dose_low 5
5 f 2 control 3
6 m 3 dose_high 8
7 m 3 dose_low 6
8 m 3 control 2
然而,感觉这是一种相当糟糕的方式。我缺少一种更清洁的方法吗?
答案 0 :(得分:2)
import io
import pandas as pd
text = '''\
control dose_high dose_low gender participant
0 4 6 4 m 1
1 3 5 5 f 2
1 2 8 6 m 3'''
df = pd.read_csv(io.BytesIO(text), sep='\s+')
result = pd.melt(df, id_vars=['participant', 'gender'])
print(result)
产量
participant gender variable value
0 1 m control 4
1 2 f control 3
2 3 m control 2
3 1 m dose_high 6
4 2 f dose_high 5
5 3 m dose_high 8
6 1 m dose_low 4
7 2 f dose_low 5
8 3 m dose_low 6