我有一个我读过的数据集:
import pandas as pd
data = pd.read_excel('.../data.xlsx')
内容如下所示:
Out[57]:
Block Concentration Name value
1 100 GlcNAc2 321
1 100 GlcNAc2 139
1 100 GlcNAc2 202
1 33 GlcNAc2 86
1 33 GlcNAc2 194
1 33 GlcNAc2 452
1 100 BCC 345
1 100 BCC 6
1 100 BCC 34
1 33 BCC 11
1 33 BCC 53
1 33 BCC 87
1 0 Print buffer 127
1 0 Print buffer 55
1 0 Print buffer 67
... ... ... ... ... ...
24 0 Print buffer -9968
24 0 Print buffer -4526
24 0 Print buffer 14246
我想为每个Block和Name添加三个' 0'浓度并添加3'打印缓冲液'从该块到这三个新的' 0'浓度。
Out[57]:
Block Concentration Name value
1 0 GlcNAc2 127
1 0 GlcNAc2 55
1 0 GlcNAc2 67
1 100 GlcNAc2 321
1 100 GlcNAc2 139
1 100 GlcNAc2 202
1 33 GlcNAc2 86
1 33 GlcNAc2 194
1 33 GlcNAc2 452
1 0 BCC 127
1 0 BCC 55
1 0 BCC 67
1 100 BCC 345
1 100 BCC 6
1 100 BCC 34
1 33 BCC 11
1 33 BCC 53
1 33 BCC 87
1 0 Print buffer 127
1 0 Print buffer 55
1 0 Print buffer 67
...... ...... ...... ......
24 0 Print buffer -9968
24 0 Print buffer -4526
24 0 Print buffer 14246
计算3'打印缓冲区的平均值'并从同一个块的每个值中减去该值。
期望的输出:
Out[57]:
Block Concentration Name value newvalue
1 0 GlcNAc2 127 127-mean(127+55+67)
1 0 GlcNAc2 55 55 -mean(127+55+67)
1 0 GlcNAc2 67 67-mean(127+55+67)
1 100 GlcNAc2 321 321-mean(127+55+67)
1 100 GlcNAc2 139 139-mean(127+55+67)
1 100 GlcNAc2 202 ....
1 33 GlcNAc2 86
1 33 GlcNAc2 194
1 33 GlcNAc2 452
1 0 BCC 127
1 0 BCC 55
1 0 BCC 67
1 100 BCC 345
1 100 BCC 6
1 100 BCC 34
1 33 BCC 11
1 33 BCC 53
1 33 BCC 87
1 0 Print buffer 127
1 0 Print buffer 55
1 0 Print buffer 67
... ... ... ... ... ...
24 0 Print buffer -9968
24 0 Print buffer -4526
24 0 Print buffer 14246
for each block
for each Name
add concentration '0' three times
append the three values of 'print buffer' to the three '0' concentrations
newvalue = value - average(three print buffer)
答案 0 :(得分:1)
考虑将groupby apply functions用于数据集。第一个函数仅使用mean()
对“打印缓冲区”的值进行平均,而将其他值保留在块0中。然后第二个函数最大化meanvalue
。最后,只需创建newvalue
作为算术差异:
def add_mean_value(mgrp):
mgrp['meanvalue'] = mgrp[mgrp['Name'] == 'Print buffer']['value'].mean()
return mgrp
data = data.groupby(['Block', 'Concentration', 'Name']).apply(add_mean_value)
def max_sum_value(mgrp):
mgrp['meanvalue'] = mgrp['meanvalue'].max()
return mgrp
data = data.groupby(['Block']).apply(max_sum_value)
data['newvalue'] = data['value'] - data['meanvalue']
print(data)
<强>输出强>
Block Concentration Name value meanvalue newvalue
0 1 100 GlcNAc2 321 83 238
1 1 100 GlcNAc2 139 83 56
2 1 100 GlcNAc2 202 83 119
3 1 33 GlcNAc2 86 83 3
4 1 33 GlcNAc2 194 83 111
5 1 33 GlcNAc2 452 83 369
6 1 100 BCC 345 83 262
7 1 100 BCC 6 83 -77
8 1 100 BCC 34 83 -49
9 1 33 BCC 11 83 -72
10 1 33 BCC 53 83 -30
11 1 33 BCC 87 83 4
12 1 0 Print buffer 127 83 44
13 1 0 Print buffer 55 83 -28
14 1 0 Print buffer 67 83 -16