Question

有很多标题相似的问题，但是我无法解决数据集中遇到的问题。

数据集：

ID   Country Type Region Gender IA01_Raw  IA01_Class1  IA01_Class2 IA02_Raw IA02_Class1 IA02_Class2 QA_Include QA_Comments

SC1  France  A    Europe Male   4         8            1            J         4            1           yes       N/A
SC2  France  A    Europe Female 2         7            2            Q         6            4           yes       N/A
SC3  France  B    Europe Male   3         7            2            K         8            2           yes       N/A
SC4  France  A    Europe Male   4         8            2            A         2            1           yes       N/A
SC5  France  B    Europe Male   1         7            1            F         1            3           yes       N/A
ID6  France  A    Europe Male   2         8            1            R         3            7           yes       N/A
ID7  France  B    Europe Male   2         8            1            Q         4            6           yes       N/A
UC8  France  B    Europe Male   4         8            2            P         4            2           yes       N/A

必需的输出：

ID   Country Type Region Gender IA Raw Class1 Class2 QA_Include QA_Comments

SC1  France  A    Europe Male   01 K   8      1      yes        N/A
SC1  France  A    Europe Male   01 L   8      1      yes       N/A
SC1  France  A    Europe Male   01 P   8      1      yes       N/A
SC1  France  A    Europe Male   02 Q   8      1      yes       N/A
SC1  France  A    Europe Male   02 R   8      1      yes       N/A
SC1  France  A    Europe Male   02 T   8      1      yes       N/A
SC1  France  A    Europe Male   03 G   8      1      yes       N/A
SC1  France  A    Europe Male   03 R   8      1      yes       N/A
SC1  France  A    Europe Male   03 G   8      1      yes       N/A
SC1  France  A    Europe Male   04 K   8      1      yes       N/A
SC1  France  A    Europe Male   04 A   8      1      yes       N/A
SC1  France  A    Europe Male   04 P   8      1      yes       N/A
SC1  France  A    Europe Male   05 R   8      1      yes       N/A
....

在“数据集”中，我有名为 IA [X] _NAME 的列，其中 X = 1..9 和 NAME = Raw，Class1 < / strong>和 Class2 。

我想做的是只是转置这些列，以使它看起来像“必需”输出中所示的表，即 IA 将显示 X 值，就像原始和类这样，它们将显示其透视值。

因此，为了实现它，我将列切片为：

idVars = list(excel_df_final.columns[0:40]) + list(excel_df_final.columns[472:527]) #These contain columns like ID, Country, Type etc valueVars = excel_df_final.columns[41:472].tolist() #All the IA_ columns

我不知道此步骤是否必要，但这为我提供了完美的列切片，但是当我将其放入melt时，它无法正常工作。我已经尝试了其他问题中几乎所有可用的方法。

pd.melt(excel_df_final, id_vars=idVars,value_vars=valueVars)

我也尝试过：

excel_df_final.set_index(idVars)[41:472].unstack()

但是没有用，这是长期的广泛实施，也没有用：

pd.wide_to_long(excel_df_final, stubnames = ['IA', 'Raw', 'Class1', 'Class2'], i=idVars, j=valueVars)

我得到的错误很长一段时间是：

ValueError：操作数不能与形状一起广播（95，）（431，）

由于我的数据集实际有526列，所以我将它们分为两个列表，其中一个包含95个列名，它们将是i，其余431个是我需要显示的列在示例数据集中显示的行中。

Answer 1

这将使您入门。本质是使用set_index，将列转换为MultiIndex，然后转换为stack。可能存在更好的解决方案，但我会这样做，因为这是实现输出的简单步骤。

# Set the index with columns that we don't want to "transpose"
df2 = df.set_index([
   'ID', 'Country', 'Type', 'Region', 'Gender', 'QA_Include', 'QA_Comments'])
# Convert headers to MultiIndex -- this is so we can melt IA values
df2.columns = pd.MultiIndex.from_tuples(map(tuple, df2.columns.str.split('_')))
# Call stack to replicate data, then reset the index
out =  df2.stack(level=0).reset_index().rename({'level_7': 'IA'}, axis=1)

out

     ID Country Type  Region  Gender QA_Include  QA_Comments    IA  Class1  Class2 Raw
0   SC1  France    A  Europe    Male        yes          NaN  IA01       8       1   4
1   SC1  France    A  Europe    Male        yes          NaN  IA02       4       1   J
2   SC2  France    A  Europe  Female        yes          NaN  IA01       7       2   2
3   SC2  France    A  Europe  Female        yes          NaN  IA02       6       4   Q
4   SC3  France    B  Europe    Male        yes          NaN  IA01       7       2   3
5   SC3  France    B  Europe    Male        yes          NaN  IA02       8       2   K
6   SC4  France    A  Europe    Male        yes          NaN  IA01       8       2   4
7   SC4  France    A  Europe    Male        yes          NaN  IA02       2       1   A
8   SC5  France    B  Europe    Male        yes          NaN  IA01       7       1   1
9   SC5  France    B  Europe    Male        yes          NaN  IA02       1       3   F
10  ID6  France    A  Europe    Male        yes          NaN  IA01       8       1   2
11  ID6  France    A  Europe    Male        yes          NaN  IA02       3       7   R
12  ID7  France    B  Europe    Male        yes          NaN  IA01       8       1   2
13  ID7  France    B  Europe    Male        yes          NaN  IA02       4       6   Q
14  UC8  France    B  Europe    Male        yes          NaN  IA01       8       2   4
15  UC8  France    B  Europe    Male        yes          NaN  IA02       4       2   P

Answer 2

u可以使用public class LimitExecuteIfCanCommandDecorator : CommandDecoratorBase { public LimitExecuteIfCanCommandDecorator( ICommand command ) : base( command ) { } public override void Execute( object parameter ) { if ( CanExecute(parameter) ) // check if it evaluates to true { base.Execute( parameter ); } } } public abstract class CommandDecoratorBase : ICommand { protected CommandDecoratorBase(ICommand command) { _command = command; } private readonly ICommand _command; public event EventHandler CanExecuteChanged { add { _command.CanExecuteChanged += value; } remove { _command.CanExecuteChanged -= value; } } public virtual bool CanExecute( object parameter ) { return _command.CanExecute( parameter ); } public virtual void Execute( object parameter ) { _command.Execute( parameter ); } }

pd.lreshape

edit：只需将输出中pd.lreshape(df.assign(IA01=['01']*len(df), IA02=['02']*len(df),IA09=['09']*len(df)), {'IA': ['IA01', 'IA02','IA09'], 'Raw': ['IA01_Raw','IA02_Raw','IA09_Raw'], 'Class1': ['IA01_Class1','IA02_Class1','IA09_Class1'], 'Class2': ['IA01_Class2', 'IA02_Class2','IA09_Class2'] }) edit : pd.lreshape(df.assign(IA01=['01']*len(df), IA02=['02']*len(df),IA09=['09']*len(df)), {'IA': ['IA01', 'IA02','IA09'], 'Raw': ['IA01_Raw_baseline','IA02_Raw_midline','IA09_Raw_whatever'], 'Class1': ['IA01_Class1_baseline','IA02_Class1_midline','IA09_Class1_whatever'], 'Class2': ['IA01_Class2_baseline', 'IA02_Class2_midline','IA09_Class2_whatever'] })列中的输入中想要的column names列添加到字典内的列表中

此文档不可用。使用Raw/Class1/Class2或参考here

输出：

help(pd.lreshape)

使用熊猫的宽到长数据集

2 个答案: