我有一个具有这种结构的数据集:
ID = c(1,1,1,1,2,2,2,3,3,3,3)
L40 = c(1, NA, NA, NA, 1, NA, NA, NA, 1, NA, NA)
K50 = c(NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, 1)
df = data.frame(ID, L40, K50)
当列L40和K50中出现缺失值时,我希望继续该列中的最后一个非缺失值,条件是ID与先前ID相同,并且当前行中L40和K50中的值为空。我应用了以下代码:
library(tidyr)
df2 <- df %>% group_by(ID) %>% fill(L40:K50)
这无法达到我想要的效果。我希望只有当该行中的其他列(ID除外)为空时,才能将之前的非缺失值转移到下一行。这就是我想要的:
ID = c(1,1,1,1,2,2,2,3,3,3,3)
L40 = c(1, 1, 1, 1, 1, NA, NA, NA, 1, 1, NA)
K50 = c(NA, NA, NA, NA, NA, 1, 1, NA, NA, NA, 1)
df3 = data.frame(ID, L40, K50)
答案 0 :(得分:0)
我们可以使用na.locf
library(data.table)
library(zoo)
setDT(df)[, if(any(is.na(K50[-1]))) lapply(.SD, na.locf) else .SD , by = ID]
# ID L40 K50
#1: 1 1 NA
#2: 1 1 NA
#3: 1 1 NA
#4: 1 1 NA
#5: 2 1 NA
#6: 2 NA 1
#7: 3 NA 1
#8: 3 NA 1
#9: 3 NA 1
使用dplyr
的选项将是
library(dplyr)
df %>%
mutate(ind = rowSums(is.na(.))) %>%
group_by(ID) %>%
mutate_each(funs(if(any(ind>1)) na.locf(., na.rm=FALSE) else .), L40:K50) %>%
select(-ind)
# ID L40 K50
# <dbl> <dbl> <dbl>
#1 1 1 NA
#2 1 1 NA
#3 1 1 NA
#4 1 1 NA
#5 2 1 NA
#6 2 NA 1
#7 3 NA 1
#8 3 NA 1
#9 3 NA 1
答案 1 :(得分:0)
我在这个问题上玩了一段时间,由于我对R的了解有限,我想出了以下解决方法。为了说明的目的,我在原始数据框中添加了一个日期列:
ID = c(1,1,1,1,2,2,2,3,3,3,3)
date = c(1,2,3,4,1,2,3,1,2,3,4)
L40 = c(1, 1, NA, NA, 1, NA, NA, NA, 1, NA, NA)
K50 = c(NA, 1, 1, NA, NA, 1, NA, NA, NA, NA, 1)
df = data.frame(ID, date, L40, K50)
这是我做的:
#gather the diagnosis columns in rows and keep only those rows where the patient has the associated diagnosis.
df1 <- df %>% gather(diagnos, dummy, L40:K50) %>% filter(dummy==1) %>% arrange(ID, date)
#concatenate across rows by ID and date to collect all diagnoses of an ID at a particular date.
df2 <- df1 %>% group_by(ID, date) %>% mutate(diag = paste(diagnos, collapse=" ")) %>% select(-diagnos, -dummy)
#convert into data tables in preparation for join
Dt1 <- data.table(df)
Dt2 <- data.table(df2)
setkey(Dt1, ID, date)
setkey(Dt2, ID, date)
#Each observation in Dt1 is matched with the observation in Dt1 with the same date or, if that particular date is not present,
#by the nearest previous date:
final <- Dt2[Dt1, roll=TRUE] %>% distinct()
这将诊断的名称延续到下一次观察诊断。