我正在尝试确定特定医疗方法的患者依从性,但是我编写的功能(使用Apply)仅适用于少于100行的数据帧。
我有两个相关的数据框,为了保护患者数据,我在这里对其进行了精简: “建议”,其中包含由唯一的患者标识号(UID)索引的治疗建议条目
> head(Advice)
# A tibble: 6 x 4
# Groups: UID [3]
UID eyepartid Proctype entereddatetime
<dbl> <chr> <chr> <dttm>
1 11556127 1 Retina Laser 2017-06-14 12:54:18
2 11556127 2 Retina Laser 2017-06-14 12:54:18
3 2680380 2 Retina Laser 2017-06-14 10:40:22
4 2680380 1 Retina Laser 2017-06-14 10:40:22
5 11275381 2 Retina Laser 2017-06-14 13:01:04
6 11275381 1 Retina Laser 2017-06-14 13:01:04
和“治疗”包含记录患者实际进诊的时间以及根据UID进行索引的条目。
>head(Treatment)
# A tibble: 6 x 4
UID eyepartid lasertype entereddatetime
<dbl> <dbl> <chr> <dttm>
1 11333944 1 Retina Laser Laser Type 2017-04-21 12:42:49
2 12022346 1 Yellow 2017-11-01 09:18:42
3 12123496 2 Green 2017-11-20 16:11:43
4 12291214 1 Yellow 2017-12-23 10:21:45
5 11005906 2 Yellow 2017-12-23 13:13:48
6 12341193 2 Green 2018-01-19 09:12:26
作为一个非常粗略的估计,我要做的第一个分析是查看在医生的建议下30天内有多少次患者进入治疗(因为大多数建议需要进行3种治疗)。
为此,我使用了一种简单但可能效率不高的算法,将新列追加到建议数据框:
Advice$treatments <- apply(Advice, 1,
function(x) {
# get date of the advice entry
AdvisedDay <- x["entereddatetime"]
# take the subset of Treatment that has the correct UID and is within 30 days
## of the advice entry
TreatSubset <- filter(UID_Treatment, UID == x["UID"],
(difftime(Treatment$entereddatetime, AdvisedDay, units = "days") <= 30))
#return the number of rows in TreatSubset
nrow(TreatSubset)
})
我正在苦苦挣扎的是,当我在head(Advice)
上调用该算法时,该算法可以完美运行
或Advice数据框的任何一部分<100行,但当我在整个Advice数据框上调用它时,每行返回零。
例如:
adviceToy <- Advice[1:10, ]
*在adviceToy上运行以上功能*
>adviceToy
# A tibble: 10 x 5
# Groups: UID [7]
UID eyepartid Proctype entereddatetime treatments
<dbl> <chr> <chr> <dttm> <int>
1 11556127 1 Retina Laser 2017-06-14 12:54:18 3
2 11556127 2 Retina Laser 2017-06-14 12:54:18 3
3 2680380 2 Retina Laser 2017-06-14 10:40:22 0
4 2680380 1 Retina Laser 2017-06-14 10:40:22 0
5 11275381 2 Retina Laser 2017-06-14 13:01:04 1
6 11275381 1 Retina Laser 2017-06-14 13:01:04 1
7 11557272 3 Retina Laser 2017-06-14 14:22:53 2
8 11492720 2 Retina Laser 2017-06-14 13:04:41 2
9 11030362 3 Retina Laser 2017-06-14 15:27:36 2
10 11244084 3 Retina Laser 2017-06-14 17:06:16 0
这是预期的输出。但是...
*现在在完整建议数据框架上运行功能* *没有警告消息*
>Advice
# A tibble: 6 x 5
# Groups: UID [3]
UID eyepartid Proctype entereddatetime treatments
<dbl> <chr> <chr> <dttm> <int>
1 11556127 1 Retina Laser 2017-06-14 12:54:18 0
2 11556127 2 Retina Laser 2017-06-14 12:54:18 0
3 2680380 2 Retina Laser 2017-06-14 10:40:22 0
4 2680380 1 Retina Laser 2017-06-14 10:40:22 0
5 11275381 2 Retina Laser 2017-06-14 13:01:04 0
6 11275381 1 Retina Laser 2017-06-14 13:01:04 0
同一函数现在返回所有零以进行处理。
关于此问题根源的任何想法?
注意:我已经清除了所有NA或NULL值的数据