我想“部分”使data.frame“更宽” 并没有真正成功。
ID X1 X2 X3 X4
1l 1 5 9 13
1r 2 6 10 14
2r 3 7 11 15
2l 4 8 12 16
ID X1_l X2_l X3_l X4_l X1_r X2_r X3_r X4_r
1 1 5 9 13 2 6 10 14
2 4 8 12 16 3 7 11 15
df <- data.frame(ID=c("1l","1r","2r","2l"),matrix(1:16,ncol=4))
l
和r
的顺序并不总是相同。如示例所示。所以这里不可能作弊。我们可以做一个自然的解决方案,但让我们不要这样做。可能只有一个r
或l
个数字。因此,例如,存在18l
但不存在18r
。我的尝试是首先用相同的数字分割数据:像这样
split(x = df,f = sub("[lr]","",df$ID))
对于类似问题的任何快速解决方案?
@Arkun
错误讯息:
Error: Duplicate identifiers for rows (7561, 7562), (7581, 7582), (7599, 7600), (7619, 7620), (7563, 7564), (7583, 7584), (7601, 7602), (7621, 7622), (7565, 7566), (7585, 7586), (7603, 7604), (7623, 7624), (7567, 7568), (7587, 7588), (7605, 7606), (7625, 7626), (7569, 7570), (7607, 7608), (7571, 7572), (7589, 7590), (7609, 7610), (7573, 7574), (7591, 7592), (7611, 7612), (7575, 7576), (7593, 7594), (7613, 7614), (7577, 7578), (7595, 7596), (7615, 7616), (7579, 7580), (7597, 7598), (7617, 7618), (6577, 6578), (6597, 6598), (6615, 6616), (6635, 6636), (6579, 6580), (6599, 6600), (6617, 6618), (6637, 6638), (6581, 6582), (6601, 6602), (6619, 6620), (6639, 6640), (6583, 6584), (6603, 6604), (6621, 6622), (6641, 6642), (6585, 6586), (6623, 6624), (6587, 6588), (6605, 6606), (6625, 6626), (6589, 6590), (6607, 6608), (6627, 6628), (6591, 6592), (6609, 6610), (6629, 6630), (6593, 6594), (6611, 6612), (6631, 6632), (6595, 6596), (6613, 6614), (6633, 6634), (7643, 7644), (7663, 7664), (7681, 76
In addition: Warning message:
Too many values at 66 locations: 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, ...
关于真实数据:
43例患者(1r,1l,... 43r,43l)
大约112个变量。首先在ID v2_1_prot
和最后v2_80_CK
所以我用x1:x4
替换了v2_1_prot:v2_80_CK
的部分。我在这里做错了吗?
我检查了我的ID变量到目前为止没有重复。
all_pat_rbind2$ID %>% count %>% extract2("freq") %>% max # is equal to one
答案 0 :(得分:3)
我们可以使用tidyverse
library(dplyr)
library(tidyr)
df %>%
separate(ID, into = c("ID", "v1"), sep="(?<=\\d)(?![0-9])") %>%
gather(key, val, -one_of("ID", "v1")) %>%
unite(key_v1, key, v1) %>%
spread(key_v1, val)
# ID X1_l X1_r X2_l X2_r X3_l X3_r X4_l X4_r
#1 1 1 2 5 6 9 10 13 14
#2 2 4 3 8 7 12 11 16 15
答案 1 :(得分:3)
temp = gsub("\\D+", "", df$ID)
data.frame(ID = unique(temp), do.call(cbind, lapply(X = c("l", "r"),
FUN = function(f) setNames(object = data.frame(sapply(X = names(df)[-1],
FUN = function(cx) sapply(X = unique(temp),
FUN = function(rx) df[df$ID == paste0(rx,f),cx]))),
nm = paste(names(df)[-1], f, sep = "_")))))
# ID X1_l X2_l X3_l X4_l X1_r X2_r X3_r X4_r
#1 1 1 5 9 13 2 6 10 14
#2 2 4 8 12 16 3 7 11 15