在Pandas中,我有一个由两组组成的数据框,每组有几个样本。每个组都有一个内部参考值,我想从该组中的所有样本值中减去。
s = u"""Group sample value
group1 ref1 18.1
group1 smp1 NaN
group1 smp2 20.3
group1 smp3 30.0
group2 ref2 16.1
group2 smp4 29.2
group2 smp5 19.9
group2 smp6 28.9
"""
df = pd.read_csv(io.StringIO(s), sep='\s+')
df = df.set_index(['Group', 'sample'])
df
Out[82]:
value
Group sample
group1 ref1 18.1
smp1 NaN
smp2 20.3
smp3 30.0
group2 ref2 16.1
smp4 29.2
smp5 19.9
smp6 28.9
我想要做的是添加一个新列,其中从每个相应组中的所有样本(smp)中减去了引用(ref)。像这样:
value deltaValue
SampleGroup sample
Group1 ref 18.1 0
smp1 NaN NaN
smp2 20.3 2.2
smp3 30.0 11.9
Group2 ref2 16.1 0
smp4 29.2 13.1
smp5 19.9 3.8
smp6 28.9 12.8
有谁知道如何做到这一点?谢谢!
答案 0 :(得分:0)
按sample
列对您的数据框进行分组。然后遍历每个组并获取ref
样本值。然后用整列减去。
> df = pd.read_csv(io.StringIO(s), sep='\s+')
> df['diff'] = 0
> df_group = df.groupby('Group')
> for index, group in df_group:
df['diff'][df.index.isin(group.index)] = group[group['sample'] == 'ref'+ str(index.split('group')[1])]['value'].values[0] - group['value']
> print df
Group sample value diff
0 group1 ref1 18.1 0.0
1 group1 smp1 NaN NaN
2 group1 smp2 20.3 -2.2
3 group1 smp3 30.0 -11.9
4 group2 ref2 16.1 0.0
5 group2 smp4 29.2 -13.1
6 group2 smp5 19.9 -3.8
7 group2 smp6 28.9 -12.8
答案 1 :(得分:0)
这是一种没有循环的方法
首先创建一个public class CrusialDateRest {
public String getShippingAddressesDetails() {
Gson gson = new GsonBuilder().create();
try {
Collection<ImplAddress> savedAddressBeans = new ArrayList<ImplAddress>();
Collection<CtFlexField> countryFields = new ArrayList<CtFlexField>();
Collection<CtFlexField> debitorFields = new ArrayList<CtFlexField>();
class ShipAddress{
Collection<ImplAddress> savedAddressBean = new ArrayList<ImplAddress>();
Collection<CtFlexField> countryField = new ArrayList<CtFlexField>();
Collection<CtFlexField> debitorField = new ArrayList<CtFlexField>();
ShipAddress( Collection<ImplAddress> savedAddressBeans, Collection<CtFlexField> countryFields,Collection<CtFlexField> debitorFields){
savedAddressBean=savedAddressBeans;
countryField=countryFields;
debitorField=debitorFields;
}
}
String addrId= XmlParser.getNodeValue(address, Statics.BUYFLOW_NAMESPACE, "AddressId");
String addrStreet1 = XmlParser.getNodeValue(address, Statics.BUYFLOW_NAMESPACE, "AddrStreet1");
String addrStreet2 = XmlParser.getNodeValue(address, Statics.BUYFLOW_NAMESPACE, "AddrStreet2");
String addrStreet3 = XmlParser.getNodeValue(address, Statics.BUYFLOW_NAMESPACE, "AddrStreet3");
ImplAddress impladdress = new ImplAddress();
impladdress.setAddressId(addrId);
impladdress.setAddrStreet1(addrStreet1);
impladdress.setAddrStreet2(addrStreet2);
impladdress.setAddrStreet3(addrStreet3);
savedAddressBeans.add(impladdress);
}
CtFlexField[] flexField = flexFields.getFlexField();
for (CtFlexField flex : flexField) {
if(flex.getBundle().equalsIgnoreCase("Countries")){
countryFields.add(flex);
}
else if(flex.getBundle().equalsIgnoreCase("CommonBundle")){
debitorFields.add(flex);
}
}
jsonResponse = gson.toJson(new ShipAddress(savedAddressBeans,countryFields,debitorFields));
OUT.debug("jsonResponse--"+jsonResponse);
} catch (Exception e) {
OUT.error("rest method getShippingAddresses error", e);
}
return jsonResponse;
}
}
函数,该函数标识以func
开头的sample
,然后计算ref
值。
delta
使用此In [33]: def func(grp):
ref = grp.ix[grp['sample'].str.startswith('ref'), 'value']
grp['delta'] = grp['value'] - ref.values[0]
return grp
并应用func
dff.groupby('Group')
首先应该是您的In [34]: dff.groupby('Group').apply(func)
Out[34]:
Group sample value delta
0 group1 ref1 18.1 0.0
1 group1 smp1 NaN NaN
2 group1 smp2 20.3 2.2
3 group1 smp3 30.0 11.9
4 group2 ref2 16.1 0.0
5 group2 smp4 29.2 13.1
6 group2 smp5 19.9 3.8
7 group2 smp6 28.9 12.8
,可以创建dff
dff = df.reset_index()