有很多标题相似的问题,但是我无法解决数据集中遇到的问题。
数据集:
ID Country Type Region Gender IA01_Raw IA01_Class1 IA01_Class2 IA02_Raw IA02_Class1 IA02_Class2 QA_Include QA_Comments
SC1 France A Europe Male 4 8 1 J 4 1 yes N/A
SC2 France A Europe Female 2 7 2 Q 6 4 yes N/A
SC3 France B Europe Male 3 7 2 K 8 2 yes N/A
SC4 France A Europe Male 4 8 2 A 2 1 yes N/A
SC5 France B Europe Male 1 7 1 F 1 3 yes N/A
ID6 France A Europe Male 2 8 1 R 3 7 yes N/A
ID7 France B Europe Male 2 8 1 Q 4 6 yes N/A
UC8 France B Europe Male 4 8 2 P 4 2 yes N/A
必需的输出:
ID Country Type Region Gender IA Raw Class1 Class2 QA_Include QA_Comments
SC1 France A Europe Male 01 K 8 1 yes N/A
SC1 France A Europe Male 01 L 8 1 yes N/A
SC1 France A Europe Male 01 P 8 1 yes N/A
SC1 France A Europe Male 02 Q 8 1 yes N/A
SC1 France A Europe Male 02 R 8 1 yes N/A
SC1 France A Europe Male 02 T 8 1 yes N/A
SC1 France A Europe Male 03 G 8 1 yes N/A
SC1 France A Europe Male 03 R 8 1 yes N/A
SC1 France A Europe Male 03 G 8 1 yes N/A
SC1 France A Europe Male 04 K 8 1 yes N/A
SC1 France A Europe Male 04 A 8 1 yes N/A
SC1 France A Europe Male 04 P 8 1 yes N/A
SC1 France A Europe Male 05 R 8 1 yes N/A
....
在“数据集”中,我有名为 IA [X] _NAME 的列,其中 X = 1..9 和 NAME = Raw,Class1 < / strong>和 Class2 。
我想做的是只是转置这些列,以使它看起来像“必需”输出中所示的表,即 IA 将显示 X 值,就像原始和类这样,它们将显示其透视值。
因此,为了实现它,我将列切片为:
idVars = list(excel_df_final.columns[0:40]) + list(excel_df_final.columns[472:527]) #These contain columns like ID, Country, Type etc
valueVars = excel_df_final.columns[41:472].tolist() #All the IA_ columns
我不知道此步骤是否必要,但这为我提供了完美的列切片,但是当我将其放入melt
时,它无法正常工作。我已经尝试了其他问题中几乎所有可用的方法。
pd.melt(excel_df_final, id_vars=idVars,value_vars=valueVars)
我也尝试过:
excel_df_final.set_index(idVars)[41:472].unstack()
但是没有用,这是长期的广泛实施,也没有用:
pd.wide_to_long(excel_df_final, stubnames = ['IA', 'Raw', 'Class1', 'Class2'], i=idVars, j=valueVars)
我得到的错误很长一段时间是:
ValueError:操作数不能与形状一起广播(95,) (431,)
由于我的数据集实际有526列,所以我将它们分为两个列表,其中一个包含95个列名,它们将是i
,其余431个是我需要显示的列在示例数据集中显示的行中。
答案 0 :(得分:2)
这将使您入门。本质是使用set_index
,将列转换为MultiIndex,然后转换为stack
。可能存在更好的解决方案,但我会这样做,因为这是实现输出的简单步骤。
# Set the index with columns that we don't want to "transpose"
df2 = df.set_index([
'ID', 'Country', 'Type', 'Region', 'Gender', 'QA_Include', 'QA_Comments'])
# Convert headers to MultiIndex -- this is so we can melt IA values
df2.columns = pd.MultiIndex.from_tuples(map(tuple, df2.columns.str.split('_')))
# Call stack to replicate data, then reset the index
out = df2.stack(level=0).reset_index().rename({'level_7': 'IA'}, axis=1)
out
ID Country Type Region Gender QA_Include QA_Comments IA Class1 Class2 Raw
0 SC1 France A Europe Male yes NaN IA01 8 1 4
1 SC1 France A Europe Male yes NaN IA02 4 1 J
2 SC2 France A Europe Female yes NaN IA01 7 2 2
3 SC2 France A Europe Female yes NaN IA02 6 4 Q
4 SC3 France B Europe Male yes NaN IA01 7 2 3
5 SC3 France B Europe Male yes NaN IA02 8 2 K
6 SC4 France A Europe Male yes NaN IA01 8 2 4
7 SC4 France A Europe Male yes NaN IA02 2 1 A
8 SC5 France B Europe Male yes NaN IA01 7 1 1
9 SC5 France B Europe Male yes NaN IA02 1 3 F
10 ID6 France A Europe Male yes NaN IA01 8 1 2
11 ID6 France A Europe Male yes NaN IA02 3 7 R
12 ID7 France B Europe Male yes NaN IA01 8 1 2
13 ID7 France B Europe Male yes NaN IA02 4 6 Q
14 UC8 France B Europe Male yes NaN IA01 8 2 4
15 UC8 France B Europe Male yes NaN IA02 4 2 P
答案 1 :(得分:1)
u可以使用public class LimitExecuteIfCanCommandDecorator : CommandDecoratorBase
{
public LimitExecuteIfCanCommandDecorator( ICommand command ) : base( command )
{
}
public override void Execute( object parameter )
{
if ( CanExecute(parameter) ) // check if it evaluates to true
{
base.Execute( parameter );
}
}
}
public abstract class CommandDecoratorBase : ICommand
{
protected CommandDecoratorBase(ICommand command)
{
_command = command;
}
private readonly ICommand _command;
public event EventHandler CanExecuteChanged
{
add
{
_command.CanExecuteChanged += value;
}
remove
{
_command.CanExecuteChanged -= value;
}
}
public virtual bool CanExecute( object parameter )
{
return _command.CanExecute( parameter );
}
public virtual void Execute( object parameter )
{
_command.Execute( parameter );
}
}
pd.lreshape
edit:只需将输出中pd.lreshape(df.assign(IA01=['01']*len(df), IA02=['02']*len(df),IA09=['09']*len(df)),
{'IA': ['IA01', 'IA02','IA09'],
'Raw': ['IA01_Raw','IA02_Raw','IA09_Raw'],
'Class1': ['IA01_Class1','IA02_Class1','IA09_Class1'],
'Class2': ['IA01_Class2', 'IA02_Class2','IA09_Class2']
})
edit :
pd.lreshape(df.assign(IA01=['01']*len(df), IA02=['02']*len(df),IA09=['09']*len(df)),
{'IA': ['IA01', 'IA02','IA09'],
'Raw': ['IA01_Raw_baseline','IA02_Raw_midline','IA09_Raw_whatever'],
'Class1': ['IA01_Class1_baseline','IA02_Class1_midline','IA09_Class1_whatever'],
'Class2': ['IA01_Class2_baseline', 'IA02_Class2_midline','IA09_Class2_whatever']
})
列中的输入中想要的column names
列添加到字典内的列表中
此文档不可用。使用Raw/Class1/Class2
或参考here
输出:
help(pd.lreshape)