按ID和变量类型

时间:2017-12-20 13:20:40

标签: r dataframe reshape

我无法重新排列以下数据框:

dat1 <- data.frame(
   id = rep(1, 4),
   var = paste0(rep(c("firstName",  "secondName"), each= 2), c(rep(1:2, 2))),
   value = c(1:4)
 )
dat2 <- data.frame(
   id = rep(2,3),
   var = paste0(rep(c("firstName", "secondName"), each= 2)[1:3], c(rep(1:2, 
2))[1:3]),
  value = c(5:7)
)
dat = rbind(dat1, dat2)
dat$type = gsub('[0-9]', '', dat$var)
# > dat
# id         var value
# 1  1  firstName1     1
# 2  1  firstName2     2
# 3  1 secondName1     3
# 4  1 secondName2     4
# 5  2  firstName1     5
# 6  2  firstName2     6
# 7  2 secondName1     7

我想得到以下结果:

id firstName  secondName
 1  1          3 
 1  2          4
 2  5          7
 2  6          NA

我已经尝试了unstack(dat, form = value ~ type),但它不起作用。

问题已更新: firstName1应该与secondName1对应,所以如果我将dat2更改为

  dat2 <- data.frame(id = rep(2,3),var =paste0(rep(c("firstName", "secondName"), each= 2)[2:4], c(rep(1:2, 2))[2:4]),value = c(5:7))
# > dat
#    id         var value       type
# 1:  1  firstName1     1  firstName
# 2:  1  firstName2     2  firstName
# 3:  1 secondName1     3 secondName
# 4:  1 secondName2     4 secondName
# 5:  2  firstName2     5  firstName
# 6:  2 secondName1     6 secondName
# 7:  2 secondName2     7 secondName

对于id = 2,他的名字应该是c(NA,6)和c(5,7)。那么如何处理这种情况呢?

3 个答案:

答案 0 :(得分:6)

我认为更好的选择是使用rowid中的data.table - 函数:

library(data.table)
dcast(setDT(dat), id + rowid(type) ~ type, value.var = 'value')[, type := NULL][]

给出:

   id firstName secondName
1:  1         1          3
2:  1         2          4
3:  2         5          7
4:  2         6         NA

更新问题:

setDT(dat)[, num := gsub('.*([0-9])', '\\1', var)
           ][, dcast(.SD, id + num ~ type, value.var = 'value')
             ][, num := NULL][]

给出:

   id firstName secondName
1:  1         1          3
2:  1         2          4
3:  2        NA          6
4:  2         5          7

答案 1 :(得分:3)

尝试dcast

res <- data.table::dcast(
    dat,
    id  + substring(as.character(var), nchar(as.character(var))) ~ type,
    value.var = 'value')

res[2] <- NULL

# > res
#   id firstName secondName
# 1  1         1          3
# 2  1         2          4
# 3  2         5          7
# 4  2         6         NA

substring(as.character(var), nchar(as.character(var)))用于将第二列的最后一个字符作为组变量。

答案 2 :(得分:3)

library(tidyr)

rbind(dat1,dat2) %>% separate(var,c("name","index"),"(?=\\d+$)") %>%
spread(key=name,value=value)

结果

  id index firstName secondName
1  1     1         1          3
2  1     2         2          4
3  2     1         5          7
4  2     2         6         NA

注意

如果你想删除col %>% dplyr::select(-index)

最后添加index