计算与pandas(python)中引用行的差异

时间:2015-05-14 13:30:18

标签: python pandas row dataframe

在Pandas中,我有一个这种类型的数据框:

                       value
SampleGroup   sample              
Group1        ref      18.1
              smp1     NaN
              smp2     20.3
              smp3     30.0
              smp4     23.8
              smp5     23.2

我想要做的是添加一个新列,其中从所有样本(smp)中减去了引用(ref)。 像这样:

                       value   deltaValue
SampleGroup   sample              
Group1        ref      18.1    0
              smp1     NaN     NaN
              smp2     20.3    2.2
              smp3     30.0    11.9
              smp4     23.8    5.7
              smp5     23.2    5.1

有谁知道如何做到这一点? 谢谢!

1 个答案:

答案 0 :(得分:2)

好的,我打破了以下对我有用的事情:

In [327]:

t="""sample value
ref      18.1
              smp1     NaN
              smp2     20.3
              smp3     30.0
              smp4     23.8
              smp5     23.2"""
df = pd.read_csv(io.StringIO(t), sep='\s+')
df
Out[327]:
  sample  value
0    ref   18.1
1   smp1    NaN
2   smp2   20.3
3   smp3   30.0
4   smp4   23.8
5   smp5   23.2
In [328]:

df['Group'] = 'Group1'
df
Out[328]:
  sample  value   Group
0    ref   18.1  Group1
1   smp1    NaN  Group1
2   smp2   20.3  Group1
3   smp3   30.0  Group1
4   smp4   23.8  Group1
5   smp5   23.2  Group1
In [329]:

df1 = df.set_index(['Group', 'sample'])
df1
Out[329]:
               value
Group  sample       
Group1 ref      18.1
       smp1      NaN
       smp2     20.3
       smp3     30.0
       smp4     23.8
       smp5     23.2

In [337]:

df1['deltaValue'] = df1['value'].sub(df1.loc[('Group1','ref')]['value'])
df1
Out[337]:
               value  deltaValue
Group  sample                   
Group1 ref      18.1         0.0
       smp1      NaN         NaN
       smp2     20.3         2.2
       smp3     30.0        11.9
       smp4     23.8         5.7
       smp5     23.2         5.1

以下工作:

df1['deltaValue'] = df1['value'] - df1.loc[('Group1','ref')]['value']