从pandas

时间:2015-05-15 11:58:01

标签: python pandas row calc

在Pandas中,我有一个由两组组成的数据框,每组有几个样本。每个组都有一个内部参考值,我想从该组中的所有样本值中减去。

s = u"""Group    sample    value
group1    ref1    18.1
group1    smp1    NaN
group1    smp2    20.3
group1    smp3    30.0
group2    ref2    16.1
group2    smp4    29.2
group2    smp5    19.9
group2    smp6    28.9
"""
df = pd.read_csv(io.StringIO(s), sep='\s+')
df = df.set_index(['Group', 'sample'])
df

Out[82]: 

                 value    
Group    sample
group1   ref1    18.1
         smp1    NaN
         smp2    20.3
         smp3    30.0
group2   ref2    16.1
         smp4    29.2
         smp5    19.9
         smp6    28.9

我想要做的是添加一个新列,其中从每个相应组中的所有样本(smp)中减去了引用(ref)。像这样:

                   value   deltaValue
SampleGroup   sample              
Group1        ref      18.1    0
              smp1     NaN     NaN
              smp2     20.3    2.2
              smp3     30.0    11.9
Group2        ref2     16.1    0
              smp4     29.2    13.1
              smp5     19.9    3.8
              smp6     28.9    12.8

有谁知道如何做到这一点?谢谢!

2 个答案:

答案 0 :(得分:0)

sample列对您的数据框进行分组。然后遍历每个组并获取ref样本值。然后用整列减去。

> df = pd.read_csv(io.StringIO(s), sep='\s+')
> df['diff'] = 0
> df_group = df.groupby('Group')
> for index, group in df_group:
      df['diff'][df.index.isin(group.index)] = group[group['sample'] == 'ref'+ str(index.split('group')[1])]['value'].values[0] - group['value']
> print df
    Group sample  value  diff
0  group1   ref1   18.1   0.0
1  group1   smp1    NaN   NaN
2  group1   smp2   20.3  -2.2
3  group1   smp3   30.0 -11.9
4  group2   ref2   16.1   0.0
5  group2   smp4   29.2 -13.1
6  group2   smp5   19.9  -3.8
7  group2   smp6   28.9 -12.8

答案 1 :(得分:0)

这是一种没有循环的方法

首先创建一个public class CrusialDateRest { public String getShippingAddressesDetails() { Gson gson = new GsonBuilder().create(); try { Collection<ImplAddress> savedAddressBeans = new ArrayList<ImplAddress>(); Collection<CtFlexField> countryFields = new ArrayList<CtFlexField>(); Collection<CtFlexField> debitorFields = new ArrayList<CtFlexField>(); class ShipAddress{ Collection<ImplAddress> savedAddressBean = new ArrayList<ImplAddress>(); Collection<CtFlexField> countryField = new ArrayList<CtFlexField>(); Collection<CtFlexField> debitorField = new ArrayList<CtFlexField>(); ShipAddress( Collection<ImplAddress> savedAddressBeans, Collection<CtFlexField> countryFields,Collection<CtFlexField> debitorFields){ savedAddressBean=savedAddressBeans; countryField=countryFields; debitorField=debitorFields; } } String addrId= XmlParser.getNodeValue(address, Statics.BUYFLOW_NAMESPACE, "AddressId"); String addrStreet1 = XmlParser.getNodeValue(address, Statics.BUYFLOW_NAMESPACE, "AddrStreet1"); String addrStreet2 = XmlParser.getNodeValue(address, Statics.BUYFLOW_NAMESPACE, "AddrStreet2"); String addrStreet3 = XmlParser.getNodeValue(address, Statics.BUYFLOW_NAMESPACE, "AddrStreet3"); ImplAddress impladdress = new ImplAddress(); impladdress.setAddressId(addrId); impladdress.setAddrStreet1(addrStreet1); impladdress.setAddrStreet2(addrStreet2); impladdress.setAddrStreet3(addrStreet3); savedAddressBeans.add(impladdress); } CtFlexField[] flexField = flexFields.getFlexField(); for (CtFlexField flex : flexField) { if(flex.getBundle().equalsIgnoreCase("Countries")){ countryFields.add(flex); } else if(flex.getBundle().equalsIgnoreCase("CommonBundle")){ debitorFields.add(flex); } } jsonResponse = gson.toJson(new ShipAddress(savedAddressBeans,countryFields,debitorFields)); OUT.debug("jsonResponse--"+jsonResponse); } catch (Exception e) { OUT.error("rest method getShippingAddresses error", e); } return jsonResponse; } } 函数,该函数标识以func开头的sample,然后计算ref值。

delta

使用此In [33]: def func(grp): ref = grp.ix[grp['sample'].str.startswith('ref'), 'value'] grp['delta'] = grp['value'] - ref.values[0] return grp 并应用func

dff.groupby('Group')

首先应该是您的In [34]: dff.groupby('Group').apply(func) Out[34]: Group sample value delta 0 group1 ref1 18.1 0.0 1 group1 smp1 NaN NaN 2 group1 smp2 20.3 2.2 3 group1 smp3 30.0 11.9 4 group2 ref2 16.1 0.0 5 group2 smp4 29.2 13.1 6 group2 smp5 19.9 3.8 7 group2 smp6 28.9 12.8 ,可以创建dff

dff = df.reset_index()