Question

我搜索了很多，我似乎无法找到针对我的特定问题的枢轴功能线。我将传达一个我正在寻找的简单例子：

长桌

dependent_variable  step a  b
         5.5          1  20 30
         5.5          2  25 37
         6.1          1  22 19
         6.1          2  18 29

所需宽表

dependent_variable   a_step1 a_step2 b_step1  b_step2
         5.5            20       25      30       37
         6.1            22       18      19       29

实际上，我想转向Step列，并为其余的自变量（在本例中为a和b）创建列名，包括步骤号和与之关联的a / b值。 / p>

一旦旋转，我将使用因变量列和numpy数组以及新旋转的因变量来提供各种机器学习算法。

当我尝试piRSquared的建议时（谢谢）我收到错误：索引包含重复的条目，无法重新形成。

然后我尝试了（来自Here）

d1 =data.set_index(['dependent_variable','step'], append=True).unstack()
d1.columns = d1.columns.map(lambda x: '{}_step{}'.format(*x))
d1.reset_index(inplace=True)

并且（使用示例表）得到以下内容：

level_0   dependent_variable a_step1 a_step2 b_step1 b_step2
  1               5.5           20      NaN    30       NaN
  2               5.5           NaN     25     NaN      37
  3               6.1           22      NaN    19       NaN
  4               6.1           NaN     18     NaN      29

所以，我仍然缺少一步

Answer 1

假设您的数据框名称为df且dependent_variable，step尚未包含在索引中

d1 = df.set_index(['dependent_variable', 'step']).unstack()
d1.columns = d1.columns.map(lambda x: '{}_step{}'.format(*x))
d1.reset_index(inplace=True)

print(d1)

   dependent_variable  a_step1  a_step2  b_step1  b_step2
0                 5.5       20       25       30       37
1                 6.1       22       18       19       29

Answer 2

It looks like you are looking for pd.pivot

"If the values argument is omitted, and the input DataFrame has more than one column of values which are not used as column or index inputs to pivot, then the resulting “pivoted” DataFrame will have hierarchical columns whose topmost level indicates the respective value column- https://pandas.pydata.org/pandas-docs/stable/reshaping.html."

df = pd.DataFrame({'dependent_variable':[5.5,5.5,6.1,6.1],
          'step':[1,2,1,2],
          'a':[20,25,22,18],
          'b':[30,37,19,29],
         })
df = df.pivot(index='dependent_variable',
     columns='step')

yields

        a       b
step    1   2   1   2
dependent_variable              
5.5     20  25  30  37
6.1     22  18  19  29

it has a hierarchical index, which might be more helpful than the output you indicated. However, you can change to a single column index by

df.columns = df.columns.tolist()

the columns don't have the exact names you wanted, but you could then rename.

Pandas中的长到宽数据框架，新列中具有枢轴列名称

2 个答案: