在对数据集进行一些数据争论和合并之后,我得到了三个变量,其中包含与虚拟示例中相同的信息:
cond.x <- c("1","2", "3","4",NA, "4", "1")
cond.y <- c("1", NA, "3", NA, "1", "4", NA)
dx <- c("scz", "cont", "siscz", "sicon", "scz", NA,NA)
mydata <-data.frame(cond.x, cond.y, dx)
> mydata
cond.x cond.y dx
1 1 1 scz
2 2 <NA> cont
3 3 3 siscz
4 4 <NA> sicon
5 <NA> 1 scz
6 4 4 <NA>
7 1 <NA> <NA>
因此1表示scz,2表示cont,3表示siscz,4表示siscon。
答案 0 :(得分:1)
将dx
转换为因子,并将其等级设为level_dx
。然后将mydata
的所有3列转换为整数类型。
mydata$dx <- factor(mydata$dx, levels = c("scz", "cont", "siscz", "sicon"))
level_dx <- levels(mydata$dx)
mydata[, 1:2] <- lapply(mydata[, 1:2], function(x) as.integer(as.character(x)) )
mydata$dx <- as.integer(mydata$dx)
使用fill
包中的tidyr
函数,向上或向下填充其先前值的列,并将dx
列转换回因子变量。
library('tidyr')
mydata <- fill( data.frame(t(mydata)), 1:7, .direction = 'up')
mydata <- data.frame( t( fill( mydata, 1:7, .direction = 'down') ) )
mydata$dx <- factor( mydata$dx, levels = sort(unique( mydata$dx )), labels = level_dx)
# cond.x cond.y dx
# X1 1 1 scz
# X2 2 2 cont
# X3 3 3 siscz
# X4 4 4 sicon
# X5 1 1 scz
# X6 4 4 sicon
# X7 1 1 scz
数据:
cond.x <- c("1","2", "3","4",NA, "4", "1")
cond.y <- c("1", NA, "3", NA, "1", "4", NA)
dx <- c("scz", "cont", "siscz", "sicon", "scz", NA,NA)
mydata <-data.frame(cond.x, cond.y, dx)
mydata
# cond.x cond.y dx
# 1 1 1 scz
# 2 2 <NA> cont
# 3 3 3 siscz
# 4 4 <NA> sicon
# 5 <NA> 1 scz
# 6 4 4 <NA>
# 7 1 <NA> <NA>
答案 1 :(得分:1)
有点短,主要得益于data.table
包:
x <- c("1","2", "3","4",NA, "4", "1")
y <- c("1", NA, "3", NA, "1", "4", NA)
dx <- c("scz", "cont", "siscz", "sicon", "scz", NA,NA)
mydata <- data.frame(x, y, dx, stringsAsFactors = FALSE)
library(data.table)
# Convert to data.table by reference
setDT(mydata)
# Merge x and y into xy
mydata[, xy := unique(na.omit(x), na.omit(y)), by = dx][]
# Create lookup table
lookup <- mydata[, .(xy = first(xy)), by = dx] %>% na.omit() %>% setnames(c('dx_l', 'xy'))
# Join mydata with lookup using xy
mydata[lookup, dy := dx_l, on = c(xy = 'xy')][]
mydata[, .(dy)]
# dy
# 1: scz
# 2: cont
# 3: siscz
# 4: sicon
# 5: scz
# 6: sicon
# 7: scz
答案 2 :(得分:1)
我们可以使用coalesce
中的tidyr
来执行此操作,以便根据&#39; cond.x&#39;创建非NA条目。和&#39; cond.y&#39;,然后使用索引更新&#39; dx&#39;
library(tidyverse)
mydata %>%
mutate(dx = dx[coalesce(cond.x, cond.y)])
# cond.x cond.y dx
#1 1 1 scz
#2 2 <NA> cont
#3 3 3 siscz
#4 4 <NA> sicon
#5 <NA> 1 scz
#6 4 4 sicon
#7 1 <NA> scz