我无法重新排列以下数据框:
dat1 <- data.frame(
id = rep(1, 4),
var = paste0(rep(c("firstName", "secondName"), each= 2), c(rep(1:2, 2))),
value = c(1:4)
)
dat2 <- data.frame(
id = rep(2,3),
var = paste0(rep(c("firstName", "secondName"), each= 2)[1:3], c(rep(1:2,
2))[1:3]),
value = c(5:7)
)
dat = rbind(dat1, dat2)
dat$type = gsub('[0-9]', '', dat$var)
# > dat
# id var value
# 1 1 firstName1 1
# 2 1 firstName2 2
# 3 1 secondName1 3
# 4 1 secondName2 4
# 5 2 firstName1 5
# 6 2 firstName2 6
# 7 2 secondName1 7
我想得到以下结果:
id firstName secondName
1 1 3
1 2 4
2 5 7
2 6 NA
我已经尝试了unstack(dat, form = value ~ type)
,但它不起作用。
问题已更新:
firstName1
应该与secondName1
对应,所以如果我将dat2更改为
dat2 <- data.frame(id = rep(2,3),var =paste0(rep(c("firstName", "secondName"), each= 2)[2:4], c(rep(1:2, 2))[2:4]),value = c(5:7))
# > dat
# id var value type
# 1: 1 firstName1 1 firstName
# 2: 1 firstName2 2 firstName
# 3: 1 secondName1 3 secondName
# 4: 1 secondName2 4 secondName
# 5: 2 firstName2 5 firstName
# 6: 2 secondName1 6 secondName
# 7: 2 secondName2 7 secondName
对于id = 2,他的名字应该是c(NA,6)和c(5,7)。那么如何处理这种情况呢?
答案 0 :(得分:6)
我认为更好的选择是使用rowid
中的data.table
- 函数:
library(data.table)
dcast(setDT(dat), id + rowid(type) ~ type, value.var = 'value')[, type := NULL][]
给出:
id firstName secondName 1: 1 1 3 2: 1 2 4 3: 2 5 7 4: 2 6 NA
更新问题:
setDT(dat)[, num := gsub('.*([0-9])', '\\1', var)
][, dcast(.SD, id + num ~ type, value.var = 'value')
][, num := NULL][]
给出:
id firstName secondName 1: 1 1 3 2: 1 2 4 3: 2 NA 6 4: 2 5 7
答案 1 :(得分:3)
尝试dcast
:
res <- data.table::dcast(
dat,
id + substring(as.character(var), nchar(as.character(var))) ~ type,
value.var = 'value')
res[2] <- NULL
# > res
# id firstName secondName
# 1 1 1 3
# 2 1 2 4
# 3 2 5 7
# 4 2 6 NA
substring(as.character(var), nchar(as.character(var)))
用于将第二列的最后一个字符作为组变量。
答案 2 :(得分:3)
library(tidyr)
rbind(dat1,dat2) %>% separate(var,c("name","index"),"(?=\\d+$)") %>%
spread(key=name,value=value)
id index firstName secondName
1 1 1 1 3
2 1 2 2 4
3 2 1 5 7
4 2 2 6 NA
%>% dplyr::select(-index)
,最后添加index
。