使用熊猫的宽到长数据集

时间:2019-06-28 04:44:52

标签: python pandas transpose

有很多标题相似的问题,但是我无法解决数据集中遇到的问题。

数据集:

ID   Country Type Region Gender IA01_Raw  IA01_Class1  IA01_Class2 IA02_Raw IA02_Class1 IA02_Class2 QA_Include QA_Comments

SC1  France  A    Europe Male   4         8            1            J         4            1           yes       N/A
SC2  France  A    Europe Female 2         7            2            Q         6            4           yes       N/A
SC3  France  B    Europe Male   3         7            2            K         8            2           yes       N/A
SC4  France  A    Europe Male   4         8            2            A         2            1           yes       N/A
SC5  France  B    Europe Male   1         7            1            F         1            3           yes       N/A
ID6  France  A    Europe Male   2         8            1            R         3            7           yes       N/A
ID7  France  B    Europe Male   2         8            1            Q         4            6           yes       N/A
UC8  France  B    Europe Male   4         8            2            P         4            2           yes       N/A

必需的输出:

ID   Country Type Region Gender IA Raw Class1 Class2 QA_Include QA_Comments

SC1  France  A    Europe Male   01 K   8      1      yes        N/A
SC1  France  A    Europe Male   01 L   8      1      yes       N/A
SC1  France  A    Europe Male   01 P   8      1      yes       N/A
SC1  France  A    Europe Male   02 Q   8      1      yes       N/A
SC1  France  A    Europe Male   02 R   8      1      yes       N/A
SC1  France  A    Europe Male   02 T   8      1      yes       N/A
SC1  France  A    Europe Male   03 G   8      1      yes       N/A
SC1  France  A    Europe Male   03 R   8      1      yes       N/A
SC1  France  A    Europe Male   03 G   8      1      yes       N/A
SC1  France  A    Europe Male   04 K   8      1      yes       N/A
SC1  France  A    Europe Male   04 A   8      1      yes       N/A
SC1  France  A    Europe Male   04 P   8      1      yes       N/A
SC1  France  A    Europe Male   05 R   8      1      yes       N/A
....

在“数据集”中,我有名为 IA [X] _NAME 的列,其中 X = 1..9 NAME = Raw,Class1 < / strong>和 Class2

我想做的是只是转置这些列,以使它看起来像“必需”输出中所示的表,即 IA 将显示 X 值,就像原始这样,它们将显示其透视值。

因此,为了实现它,我将列切片为:

idVars = list(excel_df_final.columns[0:40]) + list(excel_df_final.columns[472:527]) #These contain columns like ID, Country, Type etc
valueVars = excel_df_final.columns[41:472].tolist() #All the IA_ columns

我不知道此步骤是否必要,但这为我提供了完美的列切片,但是当我将其放入melt时,它无法正常工作。我已经尝试了其他问题中几乎所有可用的方法。

pd.melt(excel_df_final, id_vars=idVars,value_vars=valueVars)

我也尝试过:

excel_df_final.set_index(idVars)[41:472].unstack()

但是没有用,这是长期的广泛实施,也没有用:

pd.wide_to_long(excel_df_final, stubnames = ['IA', 'Raw', 'Class1', 'Class2'], i=idVars, j=valueVars)

我得到的错误很长一段时间是:

  

ValueError:操作数不能与形状一起广播(95,)   (431,)

由于我的数据集实际有526列,所以我将它们分为两个列表,其中一个包含95个列名,它们将是i,其余431个是我需要显示的列在示例数据集中显示的行中。

2 个答案:

答案 0 :(得分:2)

这将使您入门。本质是使用set_index,将列转换为MultiIndex,然后转换为stack。可能存在更好的解决方案,但我会这样做,因为这是实现输出的简单步骤。

# Set the index with columns that we don't want to "transpose"
df2 = df.set_index([
   'ID', 'Country', 'Type', 'Region', 'Gender', 'QA_Include', 'QA_Comments'])
# Convert headers to MultiIndex -- this is so we can melt IA values
df2.columns = pd.MultiIndex.from_tuples(map(tuple, df2.columns.str.split('_')))
# Call stack to replicate data, then reset the index
out =  df2.stack(level=0).reset_index().rename({'level_7': 'IA'}, axis=1)

out

     ID Country Type  Region  Gender QA_Include  QA_Comments    IA  Class1  Class2 Raw
0   SC1  France    A  Europe    Male        yes          NaN  IA01       8       1   4
1   SC1  France    A  Europe    Male        yes          NaN  IA02       4       1   J
2   SC2  France    A  Europe  Female        yes          NaN  IA01       7       2   2
3   SC2  France    A  Europe  Female        yes          NaN  IA02       6       4   Q
4   SC3  France    B  Europe    Male        yes          NaN  IA01       7       2   3
5   SC3  France    B  Europe    Male        yes          NaN  IA02       8       2   K
6   SC4  France    A  Europe    Male        yes          NaN  IA01       8       2   4
7   SC4  France    A  Europe    Male        yes          NaN  IA02       2       1   A
8   SC5  France    B  Europe    Male        yes          NaN  IA01       7       1   1
9   SC5  France    B  Europe    Male        yes          NaN  IA02       1       3   F
10  ID6  France    A  Europe    Male        yes          NaN  IA01       8       1   2
11  ID6  France    A  Europe    Male        yes          NaN  IA02       3       7   R
12  ID7  France    B  Europe    Male        yes          NaN  IA01       8       1   2
13  ID7  France    B  Europe    Male        yes          NaN  IA02       4       6   Q
14  UC8  France    B  Europe    Male        yes          NaN  IA01       8       2   4
15  UC8  France    B  Europe    Male        yes          NaN  IA02       4       2   P

答案 1 :(得分:1)

u可以使用public class LimitExecuteIfCanCommandDecorator : CommandDecoratorBase { public LimitExecuteIfCanCommandDecorator( ICommand command ) : base( command ) { } public override void Execute( object parameter ) { if ( CanExecute(parameter) ) // check if it evaluates to true { base.Execute( parameter ); } } } public abstract class CommandDecoratorBase : ICommand { protected CommandDecoratorBase(ICommand command) { _command = command; } private readonly ICommand _command; public event EventHandler CanExecuteChanged { add { _command.CanExecuteChanged += value; } remove { _command.CanExecuteChanged -= value; } } public virtual bool CanExecute( object parameter ) { return _command.CanExecute( parameter ); } public virtual void Execute( object parameter ) { _command.Execute( parameter ); } }

pd.lreshape

edit:只需将输出中pd.lreshape(df.assign(IA01=['01']*len(df), IA02=['02']*len(df),IA09=['09']*len(df)), {'IA': ['IA01', 'IA02','IA09'], 'Raw': ['IA01_Raw','IA02_Raw','IA09_Raw'], 'Class1': ['IA01_Class1','IA02_Class1','IA09_Class1'], 'Class2': ['IA01_Class2', 'IA02_Class2','IA09_Class2'] }) edit : pd.lreshape(df.assign(IA01=['01']*len(df), IA02=['02']*len(df),IA09=['09']*len(df)), {'IA': ['IA01', 'IA02','IA09'], 'Raw': ['IA01_Raw_baseline','IA02_Raw_midline','IA09_Raw_whatever'], 'Class1': ['IA01_Class1_baseline','IA02_Class1_midline','IA09_Class1_whatever'], 'Class2': ['IA01_Class2_baseline', 'IA02_Class2_midline','IA09_Class2_whatever'] }) 列中的输入中想要的column names列添加到字典内的列表中

此文档不可用。使用Raw/Class1/Class2或参考here

  

输出:

help(pd.lreshape)