我正在研究一个新生儿项目,长话短说,就是根据给定时间点的症状给新生儿分配一定的分数,并根据分数随着时间的变化而变化,我们来决定是否增加药物剂量,使其保持不变或断奶。我们将这3个状态分别表示为+1(增加),0(保持)或-1(断奶)。决定做什么的规则如下:
在这里人们的帮助下,我们拥有用于增加剂量和保持剂量的代码。但是,我正在努力编写规则来确定如何降低剂量。这是我们拥有的代码示例:
import pandas as pd
df = pd.DataFrame({
'baby': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B','B', 'B', 'B', 'B', 'B','B','B'],
'dateandtime': ['8/2/2009 5:00:00 PM', '7/19/2009 5:00:00 PM', '7/19/2009 5:00:00 PM', '7/17/2009 6:00:00 AM','7/17/2009 12:01:00 AM', '7/14/2009 12:01:00 AM', '7/19/2009 5:00:00 AM', '7/16/2009 9:00:00 PM','7/19/2009 9:00:00 AM', '7/14/2009 6:00:00 PM', '7/15/2009 3:04:00 PM', '7/20/2009 5:00:00 PM','7/16/2009 12:01:00 AM', '7/18/2009 1:00:00 PM', '7/16/2009 6:00:00 AM', '7/13/2009 9:00:00 PM','7/19/2009 1:00:00 AM','7/15/2009 12:04:00 AM'],
'score': [6, 3, 3, 5, 10, 14, 5, 4, 11, 4, 4, 6, 7, 4, 6, 12, 6, 6]
})
df.dateandtime = pd.to_datetime(df['dateandtime']) # change column type for ease of indexing
df = df.set_index('dateandtime')
df.sort_index(inplace = True)
df = df[~df.index.duplicated()] #Remove any duplicated rows
#Calculate conditions
df['sum_3_scores'] = df.groupby('baby')['score'].rolling(3).sum().reset_index(0,drop=True)
df['max_1_score'] = df.groupby('baby')['score'].rolling(1).max().reset_index(0,drop=True)
#you don't nead to calculate the 24hr mean because the 48hr max is 8 the 24hr mean will also be < 8
#df['mean_24hr_score'] = df.groupby('baby')['score'].rolling('24h').mean().reset_index(0,drop=True)
#scoring logic
def score(data):
if data['sum_3_scores'] >= 24 or data['max_1_score'] >= 12:
return 1
return 0
df['rule'] = df.apply(score, axis = 1)
df.reset_index().set_index(['baby','dateandtime']).sort_index()
print(df)
这将产生一个具有我想要的漂亮数据框(除了减少剂量的规则外):
baby score sum_3_scores max_1_score rule
dateandtime
2009-07-13 21:00:00 B 12 NaN 12.0 1
2009-07-14 00:01:00 A 14 NaN 14.0 1
2009-07-14 18:00:00 B 4 NaN 4.0 0
2009-07-15 00:04:00 B 6 22.0 6.0 0
2009-07-15 15:04:00 B 4 14.0 4.0 0
2009-07-16 00:01:00 B 7 17.0 7.0 0
2009-07-16 06:00:00 B 6 17.0 6.0 0
2009-07-16 21:00:00 A 4 NaN 4.0 0
2009-07-17 00:01:00 A 10 28.0 10.0 1
2009-07-17 06:00:00 A 5 19.0 5.0 0
2009-07-18 13:00:00 B 4 17.0 4.0 0
2009-07-19 01:00:00 B 6 16.0 6.0 0
2009-07-19 05:00:00 A 5 20.0 5.0 0
2009-07-19 09:00:00 A 11 21.0 11.0 0
2009-07-19 17:00:00 A 3 19.0 3.0 0
2009-07-20 17:00:00 B 6 16.0 6.0 0
2009-08-02 17:00:00 A 6 20.0 6.0 0
编程降低剂量规则的简便方法是什么?我知道我可以使用代码df.groupby('baby')['score']。rolling('48h')来执行48h窗口,但是我不清楚如何仅检查3个最近剂量的总和该窗口的
答案 0 :(得分:0)
您的设置:
table[hash(element) % table_length].push(element)
我将在import pandas as pd
df = pd.DataFrame({
'baby': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B','B', 'B', 'B', 'B', 'B','B','B'],
'dateandtime': ['8/2/2009 5:00:00 PM', '7/19/2009 5:00:00 PM', '7/19/2009 5:00:00 PM', '7/17/2009 6:00:00 AM','7/17/2009 12:01:00 AM', '7/14/2009 12:01:00 AM', '7/19/2009 5:00:00 AM', '7/16/2009 9:00:00 PM','7/19/2009 9:00:00 AM', '7/14/2009 6:00:00 PM', '7/15/2009 3:04:00 PM', '7/20/2009 5:00:00 PM','7/16/2009, 12:01:00 AM', '7/18/2009 1:00:00 PM', '7/16/2009 6:00:00 AM', '7/13/2009 9:00:00 PM','7/19/2009 1:00:00 AM','7/15/2009 12:04:00 AM'],
'score': [6, 3, 3, 5, 10, 14, 5, 4, 11, 4, 4, 6, 7, 4, 6, 12, 6, 6]
})
df.dateandtime = pd.to_datetime(df['dateandtime']) # change column type for ease of indexing
df = df.set_index('dateandtime')
df = df[~df.index.duplicated()] #Remove any duplicated rows
三次上使用.diff()
。手动检查.groupby()
和max_last3
和sum_last3
时,我建议按last48h_any_critical
和baby
进行排序:
dateandtime
要先按宝贝分组来获取最后3个值的总和,然后获得3个滚动窗口,然后获得每个窗口的总和。
重要:如果前两个值例如12、13这两个的总和> = 24,但是无法建立大小为3的窗口!因此,值将为# this helps
df = df.sort_values(by=['baby', 'dateandtime'])
# this is okay too
df.sort_index(inplace=True)
和NaN
。要允许构建不完整的窗口,请使用(Nan >= 24) == False
。
min_periods=1
我仍然不确定您是要查看所有分数,最后3个分数还是仅查看最后一个分数。此实现在最后3个分数中检测到> = 12的值。最后是替代解决方案。
sum_last3 = df.groupby('baby')['score'].rolling(3, min_periods=1).sum()
df['sum_last3'] = sum_last3.reset_index(level=0, drop=True)
df['sum_last3_critical'] = df['sum_last3'] >= 24
df['sum_last3_good'] = df['sum_last3'] < 18
现在,您可以建立一个max_last3 = df.groupby('baby')['score'].rolling(3, min_periods=1).max()
df['max_last3'] = max_last3.reset_index(level=0, drop=True)
df['max_last3_ciritical'] = df['max_last3'] >= 12
df['max_last3_good'] = df['max_last3'] < 8
列,该列指示是否必须增加Dosis。必须。
critical
现在,您将获得48小时的时间窗口并获得关键列的最大值(如果为True,则为1.0;如果为False,则为0.0)。理想情况下,您将使用df['critical'] = df['sum_last3_critical'] | df['max_last3_ciritical']
,但是.any()
对象不存在此对象。由于GroupBy
返回一个数值,此后将其转换回布尔值。
.max()
现在您可以让宝宝保持良好的状况,应该减少剂量。
last48h_any_critical = df.groupby('baby').rolling('48h')['critical'].max().astype('bool')
df['last48h_good'] = ~last48h_any_critical.reset_index(level=0, drop=True)
要获取操作值,只需从df['good'] = df['last48h_good'] & df['sum_last3_good'] & df['max_last3_good']
列中减去good
列。
critical
生成的DataFrame如下所示:
df['action'] = df['critical'].astype(int) - df['good'].astype(int)
如果要查看所有之前的值,而不是查看最后三个值。请改用 baby score sum_last3 sum_last3_critical sum_last3_good max_last3 max_last3_ciritical max_last3_good critical last48h_good good action
dateandtime
2009-07-14 00:01:00 A 14 14.0 False True 14.0 True False True False False 1
2009-07-16 21:00:00 A 4 18.0 False False 14.0 True False True False False 1
2009-07-17 00:01:00 A 10 28.0 True False 14.0 True False True False False 1
2009-07-17 06:00:00 A 5 19.0 False False 10.0 False False False False False 0
2009-07-19 05:00:00 A 5 20.0 False False 10.0 False False False True False 0
2009-07-19 09:00:00 A 11 21.0 False False 11.0 False False False True False 0
2009-07-19 17:00:00 A 3 19.0 False False 11.0 False False False True False 0
2009-08-02 17:00:00 A 6 20.0 False False 11.0 False False False True False 0
2009-07-13 21:00:00 B 12 12.0 False True 12.0 True False True False False 1
2009-07-14 18:00:00 B 4 16.0 False True 12.0 True False True False False 1
2009-07-15 00:04:00 B 6 22.0 False False 12.0 True False True False False 1
2009-07-15 15:04:00 B 4 14.0 False True 6.0 False True False False False 0
2009-07-16 00:01:00 B 7 17.0 False True 7.0 False True False False False 0
2009-07-16 06:00:00 B 6 17.0 False True 7.0 False True False False False 0
2009-07-18 13:00:00 B 4 17.0 False True 7.0 False True False True True -1
2009-07-19 01:00:00 B 6 16.0 False True 6.0 False True False True True -1
2009-07-20 17:00:00 B 6 16.0 False True 6.0 False True False True True -1
。
expanding
如果您只想查看最后一个值,则可以直接与# ideally change name of max_last3 to something like max_alltime
max_last3 = df.groupby('baby')['score'].expanding().max()
df['max_last3'] = max_last3.reset_index(level=0, drop=True)
df['max_last3_ciritical'] = df['max_last3'] >= 12
df['max_last3_good'] = df['max_last3'] < 8
进行比较:
score