我有一个多级数据框df
。作为专栏,我有不同的"对象"
我分析一下。作为行索引,我有一个案例ID lc
和时间t
。
我需要为每个案例lc
找到时间t
(理想情况下是内插的,但是
最接近的值很好),每个对象达到目标值。
此目标值是时间t==0
处给定对象的函数。
import pandas as pd
print(pd.__version__)
0.16.2
虚拟数据集示例:
data = {1: {(1014, 0.0): 20.25,
(1014, 0.0991): 19.08,
(1014, 0.1991): 18.43,
(1014, 0.2991): 19.03,
(1014, 0.3991): 18.71,
(1015, 0.0): 20.22,
(1015, 0.0991): 19.3,
(1015, 0.1991): 18.68,
(1015, 0.2991): 18.22,
(1015, 0.3991): 17.84,
(1016, 0.0): 21.75,
(1016, 0.0991): 19.97,
(1016, 0.1991): 19.65,
(1016, 0.2991): 19.29,
(1016, 0.3991): 18.94
},
2: {(1014, 0.0): 29.11,
(1014, 0.0991): 28.68,
(1014, 0.1991): 28.27,
(1014, 0.2991): 27.46,
(1014, 0.3991): 26.96,
(1015, 0.0): 29.22,
(1015, 0.0991): 28.64,
(1015, 0.1991): 28.18,
(1015, 0.2991): 27.74,
(1015, 0.3991): 27.25,
(1016, 0.0): 29.17,
(1016, 0.0991): 28.68,
(1016, 0.1991): 28.17,
(1016, 0.2991): 27.68,
(1016, 0.3991): 27.18
},
3: {(1014, 0.0): 22.01,
(1014, 0.0991): 21.5,
(1014, 0.1991): 21.18,
(1014, 0.2991): 20.58,
(1014, 0.3991): 20.21,
(1015, 0.0): 21.81,
(1015, 0.0991): 21.46,
(1015, 0.1991): 21.11,
(1015, 0.2991): 20.78,
(1015, 0.3991): 20.42,
(1016, 0.0): 21.82,
(1016, 0.0991): 21.49,
(1016, 0.1991): 21.11,
(1016, 0.2991): 20.75,
(1016, 0.3991): 20.37
}}
df = pd.DataFrame(data).sort()
df.index.names=['case', 't']
Dataframe看起来像:
1 2 3
case t
1014 0.0000 20.25 29.11 22.01
0.0991 19.08 28.68 21.50
0.1991 18.43 28.27 21.18
0.2991 19.03 27.46 20.58
0.3991 18.71 26.96 20.21
1015 0.0000 20.22 29.22 21.81
0.0991 19.30 28.64 21.46
0.1991 18.68 28.18 21.11
0.2991 18.22 27.74 20.78
0.3991 17.84 27.25 20.42
1016 0.0000 21.75 29.17 21.82
0.0991 19.97 28.68 21.49
0.1991 19.65 28.17 21.11
0.2991 19.29 27.68 20.75
0.3991 18.94 27.18 20.37
目标值是时间t==0
的值的函数。
通常,对于半衰期,这将是k = 0.5。对于当前样本,我们将采用k = 0.926
由于值已排序,因此可以针对每种情况采用第一行。
targets = df.groupby(level='case').first() * 0.926
print(targets)
1 2 3
case
1014 18.75150 26.95586 20.38126
1015 18.72372 27.05772 20.19606
1016 20.14050 27.01142 20.20532
现在,我怎么能简单地构建以下数据帧,如图所示
时间t
每个对象达到上面计算的目标值?
1 2 3
case
1014 0.3991 0.3991 0.2991
1015 0.1991 0.3991 0.3991
1016 0.0991 0.3991 0.3991
答案 0 :(得分:1)
这些有点像黑客,让我们看看是否有更好的解决方案:
In [36]:
targets['t']=0
In [37]:
df2 = df.reset_index().set_index('case') - targets
In [38]:
df3 = df2.groupby(df2.index).transform(lambda x: x.abs()==np.min(x.abs()))
In [39]:
df4 = pd.DataFrame({'1': df2.t[df3[1]],
'2': df2.t[df3[2]],
'3': df2.t[df3[3]]})
print df4
1 2 3
case
1014 0.3991 0.3991 0.3991
1015 0.1991 0.3991 0.3991
1016 0.0991 0.3991 0.3991