Question

我有一个数据框（df），其中A列是药物单位，在时间戳给定的时间点给药。我想用给定药物半衰期（180分钟）的药物浓度填充缺失值（NaN）。我正在努力与熊猫的代码。非常感谢帮助和见解。提前致谢

df
                       A     
Timestamp                                                      
1991-04-21 09:09:00   9.0        
1991-04-21 3:00:00   NaN       
1991-04-21 9:00:00   NaN       
1991-04-22 07:35:00  10.0      
1991-04-22 13:40:00   NaN        
1991-04-22 16:56:00   NaN

鉴于药物的半衰期为180分钟。我想把fillna（值）作为时间的函数和药物的半衰期

类似

Timestamp             A     

1991-04-21 09:00:00   9.0  
1991-04-21 3:00:00   ~2.25   
1991-04-21 9:00:00   ~0.55   
1991-04-22 07:35:00  10.0  
1991-04-22 13:40:00   ~2.5   
1991-04-22 16:56:00   ~0.75

Answer 1

您的时间戳没有排序，我认为这是一个错字。我把它固定在下面。

import pandas as pd
import numpy as np
from StringIO import StringIO

text = """TimeStamp                    A     
1991-04-21 09:09:00   9.0        
1991-04-21 13:00:00   NaN       
1991-04-21 19:00:00   NaN       
1991-04-22 07:35:00  10.0      
1991-04-22 13:40:00   NaN        
1991-04-22 16:56:00   NaN  """

df = pd.read_csv(StringIO(text), sep='\s{2,}', engine='python', parse_dates=[0])

这是神奇的代码。

# half-life of 180 minutes is 10,800 seconds
# we need to calculate lamda (intentionally mis-spelled)
lamda = 10800 / np.log(2)

# returns time difference for each element
# relative to first element
def time_diff(x):
    return x - x.iloc[0]

# create partition of non-nulls with subsequent nulls
partition = df.A.notnull().cumsum()

# calculate time differences in seconds for each
# element relative to most recent non-null observation
# use .dt accessor and method .total_seconds()
tdiffs = df.TimeStamp.groupby(partition).apply(time_diff).dt.total_seconds()

# apply exponential decay
decay = np.exp(-tdiffs / lamda)

# finally, forward fill the observations and multiply by decay
decay * df.A.ffill()

0     9.000000
1     3.697606
2     0.924402
3    10.000000
4     2.452325
5     1.152895
dtype: float64

如何知道半衰期时药物的不规则时间序列的填充/缺失值

1 个答案: