我搜索了很多,我似乎无法找到针对我的特定问题的枢轴功能线。我将传达一个我正在寻找的简单例子:
长桌
dependent_variable step a b
5.5 1 20 30
5.5 2 25 37
6.1 1 22 19
6.1 2 18 29
所需宽表
dependent_variable a_step1 a_step2 b_step1 b_step2
5.5 20 25 30 37
6.1 22 18 19 29
实际上,我想转向Step列,并为其余的自变量(在本例中为a和b)创建列名,包括步骤号和与之关联的a / b值。 / p>
一旦旋转,我将使用因变量列和numpy数组以及新旋转的因变量来提供各种机器学习算法。
当我尝试piRSquared的建议时(谢谢)我收到错误:索引包含重复的条目,无法重新形成。
然后我尝试了(来自Here)
d1 =data.set_index(['dependent_variable','step'], append=True).unstack()
d1.columns = d1.columns.map(lambda x: '{}_step{}'.format(*x))
d1.reset_index(inplace=True)
并且(使用示例表)得到以下内容:
level_0 dependent_variable a_step1 a_step2 b_step1 b_step2
1 5.5 20 NaN 30 NaN
2 5.5 NaN 25 NaN 37
3 6.1 22 NaN 19 NaN
4 6.1 NaN 18 NaN 29
所以,我仍然缺少一步
答案 0 :(得分:0)
假设您的数据框名称为df
且dependent_variable
,step
尚未包含在索引中
d1 = df.set_index(['dependent_variable', 'step']).unstack()
d1.columns = d1.columns.map(lambda x: '{}_step{}'.format(*x))
d1.reset_index(inplace=True)
print(d1)
dependent_variable a_step1 a_step2 b_step1 b_step2
0 5.5 20 25 30 37
1 6.1 22 18 19 29
答案 1 :(得分:0)
It looks like you are looking for pd.pivot
"If the values argument is omitted, and the input DataFrame has more than one column of values which are not used as column or index inputs to pivot, then the resulting “pivoted” DataFrame will have hierarchical columns whose topmost level indicates the respective value column- https://pandas.pydata.org/pandas-docs/stable/reshaping.html."
df = pd.DataFrame({'dependent_variable':[5.5,5.5,6.1,6.1],
'step':[1,2,1,2],
'a':[20,25,22,18],
'b':[30,37,19,29],
})
df = df.pivot(index='dependent_variable',
columns='step')
yields
a b
step 1 2 1 2
dependent_variable
5.5 20 25 30 37
6.1 22 18 19 29
it has a hierarchical index, which might be more helpful than the output you indicated. However, you can change to a single column index by
df.columns = df.columns.tolist()
the columns don't have the exact names you wanted, but you could then rename.