寻找以下问题的简单解决方案 这就是我的数据的外观:
ClientID PatientID Measure Value CollectionDatetime
41 123456 Temperature 87 02-04-2017
41 123456 WBC 1000 02-04-2017
41 123456 Temperature 83 02-05-2017
41 23456 WBC 10000 02-04-2017
41 23456 RR 100 02-04-2017
41 23456 C-Ceratine 90 02-05-2017
41 23456 Temperature 87 02-06-2017
41 23456 Temperature 89 02-06-2017
这就是我想要输出的方式:
ClientID PatientID Measure Value CollectionDatetime Label
41 123456 Temperature 87 02-04-2017 1
41 123456 WBC 1000 02-04-2017 1
41 123456 Temperature 87 02-04-2017 2
41 123456 WBC 1000 02-04-2017 2
41 123456 Temperature 83 02-05-2017 2
41 23456 WBC 10000 02-04-2017 1
41 23456 RR 100 02-04-2017 1
41 23456 WBC 10000 02-04-2017 2
41 23456 RR 100 02-04-2017 2
41 23456 C-Ceratine 90 02-05-2017 2
41 23456 WBC 10000 02-04-2017 3
41 23456 RR 100 02-04-2017 3
41 23456 C-Ceratine 90 02-05-2017 3
41 23456 Temperature 87 02-06-2017 3
41 23456 Temperature 89 02-06-2017 3
应根据患者ID和CollectionDatetime复制数据。 对于每个患者ID,如果是第1天,第2天应该有第1天和第2天的数据,依此类推
答案 0 :(得分:0)
使用data.table
- 包:
# load the data.table package & convert 'dat' to a data.table
library(data.table)
setDT(dat)
# create the 'lbl' variable and the number of times each row needs to be repeated
dat[, lbl := rleid(CollectionDatetime), PatientID
][, reps := abs(lbl - max(lbl)), PatientID]
# create a 2nd data.table with the repeated rows
# make a sequence for each replication
# add that to 'lbl' to get correct 'lbl'
d2 <- dat[rep(1:nrow(dat), reps)][, lbl := lbl + 1:max(reps), .(PatientID,lbl)]
# bind the original data.table and the new together
# remove 'reps' column (no longer needed)
# and order to match the expected output
rbindlist(list(dat,d2))[, reps := NULL][order(-PatientID,lbl,CollectionDatetime)]
给出:
ClientID PatientID Measure Value CollectionDatetime lbl
1: 41 123456 Temperature 87 2017-02-04 1
2: 41 123456 WBC 1000 2017-02-04 1
3: 41 123456 Temperature 87 2017-02-04 2
4: 41 123456 WBC 1000 2017-02-04 2
5: 41 123456 Temperature 83 2017-02-05 2
6: 41 23456 WBC 10000 2017-02-04 1
7: 41 23456 RR 100 2017-02-04 1
8: 41 23456 WBC 10000 2017-02-04 2
9: 41 23456 RR 100 2017-02-04 2
10: 41 23456 C-Ceratine 90 2017-02-05 2
11: 41 23456 WBC 10000 2017-02-04 3
12: 41 23456 RR 100 2017-02-04 3
13: 41 23456 C-Ceratine 90 2017-02-05 3
14: 41 23456 Temperature 87 2017-02-06 3
15: 41 23456 Temperature 89 2017-02-06 3
您可以在基础R中实现相同的目标:
dat$lbl <- with(dat, ave(as.numeric(CollectionDatetime), PatientID, FUN = function(x) cumsum(c(1, diff(x) > 0))))
dat$reps <- with(dat, ave(lbl, PatientID, FUN = function(x) abs(x - max(x))))
dat2 <- dat[rep(1:nrow(dat), dat$reps),]
dat2$lbl <- dat2$lbl + with(dat2, ave(reps, cumsum(c(0,abs(diff(dat2$reps)))), FUN = function(x) 1:max(x)))
d <- rbind(dat,dat2)[,-7]
d[order(-d$PatientID,d$lbl,d$CollectionDatetime),]
使用过的数据:
dat <- structure(list(ClientID = c(41L, 41L, 41L, 41L, 41L, 41L, 41L, 41L),
PatientID = c(123456L, 123456L, 123456L, 23456L, 23456L, 23456L, 23456L, 23456L),
Measure = structure(c(3L, 4L, 3L, 4L, 2L, 1L, 3L, 3L), .Label = c("C-Ceratine", "RR", "Temperature", "WBC"), class = "factor"),
Value = c(87L, 1000L, 83L, 10000L, 100L, 90L, 87L, 89L),
CollectionDatetime = structure(c(17201, 17201, 17202, 17201, 17201, 17202, 17203, 17203), class = "Date")),
.Names = c("ClientID", "PatientID", "Measure", "Value", "CollectionDatetime"), row.names = c(NA, -8L), class = "data.frame")