继续我的上一个问题How do I return multiple columns without consider Na values and group by other columns name in R?
Mexico_01 <- c(1,2,5,1,NA,1)
Mexico_02 <- c(3,NA,2,0,4,1)
Argentina_01 <- c(2,1,5,2,NA,2)
Argentina_02 <- c(2,3,NA,2,2,2)
Italy<- c(NA,10,10,10,NA,10)
Spain_01 <- c(2,NA,4,6,8,11)
Spain_02 <- c(3,4,NA,11,11,11)
England <- c(5,NA,10,NA,NA,12)
Germany <- c(1,NA,NA,NA,NA,10)
Data_Risk = data.frame( Mexico_01, Mexico_02, Argentina_01, Argentina_02,
Italy, Spain_01, Spain_02, England, Germany)
Data_Risk <- as.data.table(Data_Risk)
library(data.table)
library(magrittr)
all_variable <- as.data.table(which(!is.na(Data_Risk), arr.ind = T))
all_variable [, .(colnm = names(Data_Risk)[col], col = paste0('var',
order(col))) , by = row] %>% dcast(row ~ col, value.var = 'colnm')
给予
row var1 var2 var3 var4 var5 var6
var7
1: 1 Mexico_01 Mexico_02 Argentina_01 Argentina_02 Spain_01 Spain_02
England
2: 2 Mexico_01 Argentina_01 Argentina_02 Italy Spain_02 <NA>
<NA>
3: 3 Mexico_01 Mexico_02 Argentina_01 Italy Spain_01 England
<NA>
4: 4 Mexico_01 Mexico_02 Argentina_01 Argentina_02 Italy Spain_01
Spain_02
5: 5 Mexico_02 Argentina_02 Spain_01 Spain_02 <NA> <NA>
<NA>
6: 6 Mexico_01 Mexico_02 Argentina_01 Argentina_02 Italy Spain_01
Spain_02
var8 var9
1: Germany <NA>
2: <NA> <NA>
3: <NA> <NA>
4: <NA> <NA>
5: <NA> <NA>
6: England Germany
在这种情况下,我只需要考虑所有具有相同前缀的变量,例如:代替mexico_01或mexico_02只选择墨西哥。
所以决赛桌必须像这样:
var1 var2 var3 var4 var5 var6
mexico argentina england germany null null
mexico argentina italy null null null
mexico argentina italy spain england null
mexico argentina italy spain null null
spain null null null null null
mexico argentina italy spain england germany
答案 0 :(得分:0)
我们可以用tstrsplit
拆分列,基于'row','V1'列获取duplicated
id,将'V1'中的那些元素分配给NA
,然后执行dcast
out[, c("V1", "V2") := tstrsplit(colnm, "_")]
i1 <- out[, .I[duplicated(.SD)], .SDcols = c('row', 'V1')]
out[i1, V1 := NA_character_]
out[, V1 := V1[order(is.na(V1))], row]
dcast(out, row ~ col, value.var = "V1")[, row := NULL][]
out <- all_variable [, .(colnm = names(Data_Risk)[col],
col = paste0('var', order(col))) , by = row]