我最近开始使用R,但之前从未进行过编码,因此我发现自己陷入了以下问题:
我有两个需要合并的数据框(具有不同的行和列长度)。合并本身不是问题,但是我对两个数据帧中变量的差异存在疑问。第一个数据帧将参与者描述为-1,-2,-3等。我的第二个数据帧将参与者描述为STR_PP001,STR_PP002,STR_PP003等。
目标是将所有数据组合到一个数据帧中,该数据帧将参与者描述为STR_PP001(或特定参与者的编号)。有没有一种方法可以转换第一个数据框中的列,使其将参与者代码显示为STR_PP而不是-1?
提前谢谢!
答案 0 :(得分:2)
示例数据:
a <- paste0("-", 1:4)
a
#[1] "-1" "-2" "-3" "-4"
名称转换
b <- paste0("STR_PP00", sapply(strsplit(a, "-"),"[[", 2))
b
#[1] "STR_PP001" "STR_PP002" "STR_PP003" "STR_PP004"
基本上,此代码段的作用是用“-”分隔,其中strsplit()
的输出是一个列表。然后,我们利用sapply()
在列表中选择每个向量的第二个元素。之后,您可以利用paste0()
将提取的数字和所需的前缀粘贴在一起。
更新以同时包含较高的ID
a <- paste0("-", 1:128)
b <- "STR_PP"
# Amount of zeros required, -1 because of the "-" that is counted in nchar()
# -3 becasue the maximum length is 3 for id > 99 and times -1 because we
# want positive numbers
zerolen <- ((nchar(a) - 1) - 3) * (-1)
# Now one can add the amount of required 0 based on the length of ID number
c <- sapply(zerolen, function(x){
paste(as.character((rep(0, x))), collapse = "")
})
# Again combine with paste()
paste0(b, c, sapply(strsplit(a, "-"),"[[", 2))
# Which results in:
head(paste0(b, c, sapply(strsplit(a, "-"),"[[", 2)), 20)
# [1] "STR_PP001" "STR_PP002" "STR_PP003" "STR_PP004" "STR_PP005"
# "STR_PP006" "STR_PP007" "STR_PP008" "STR_PP009" "STR_PP010"
# [11] "STR_PP011" "STR_PP012" "STR_PP013" "STR_PP014" "STR_PP015"
# "STR_PP016" "STR_PP017" "STR_PP018" "STR_PP019" "STR_PP020"
答案 1 :(得分:1)
此嵌套的ifelse
语句使用gsub
和向后引用有效:
a <- c("-1", "-3", "-10", "-55", "-100", "-112")
ifelse(grepl("-\\d$", a), paste0("STR_PP00", gsub("-(\\d)", "\\1", a)),
ifelse(grepl("-\\d{2}$", a), paste0("STR_PP0", gsub("-(\\d+)", "\\1", a)),
paste0("STR_PP", gsub("-(\\d+)", "\\1", a))))
[1] "STR_PP001" "STR_PP003" "STR_PP010" "STR_PP055" "STR_PP100" "STR_PP112"
答案 2 :(得分:0)
但是可以肯定的一种方法是: 如果您在第二个数据框中将变量称为VAR,则可以执行以下操作:
VAR[which(VAR == -1)] <- "STR_PP001"
,其他数字依此类推。如果-1是一个字符,则可能必须设置VAR[which(VAR == "-1")] <- "STR_PP001"