我的数据包含需要解压缩并创建新列的样本名称。
sample
P10.1
P11.2
S1.1
S3.3
使用样本ID数据,我需要制作三个新列:组织,植物,阶段。
sample tissue plant stage
P10.1 P 10 1
P11.2 P 11 2
S1.1 S 1 1
S3.3 S 3 3
有没有办法从示例列中提取数据以填充三个新列?
答案 0 :(得分:6)
使用dplyr
和tidyr
。
首先我们插入"。"在示例代码中,接下来我们将样本分成3列。
library(dplyr)
library(tidyr)
df %>%
mutate(sample = paste0(substring(df$sample, 1, 1), ".", substring(df$sample, 2))) %>%
separate(sample, into = c("tissue", "plant", "stage"), remove = FALSE)
sample tissue plant stage
1 P.10.1 P 10 1
2 P.11.2 P 11 2
3 S.1.1 S 1 1
4 S.3.3 S 3 3
数据:
df <- structure(list(sample = c("P10.1", "P11.2", "S1.1", "S3.3")),
.Names = "sample",
class = "data.frame",
row.names = c(NA, -4L))
答案 1 :(得分:3)
与@phiver类似,但使用正则表达式。
pattern
内:
tissue
)plant
)stage
) sub()
函数会拉出那些捕获组,然后放入新变量。
library(magrittr)
pattern <- "^([A-Z])(\\d{1,2})\\.(\\d{1,2})$"
df %>%
dplyr::mutate(
tissue = sub(pattern, "\\1", sample),
plant = as.integer(sub(pattern, "\\2", sample)),
stage = as.integer(sub(pattern, "\\3", sample))
)
结果(显示str()
):
'data.frame': 4 obs. of 4 variables:
$ sample: chr "P10.1" "P11.2" "S1.1" "S3.3"
$ tissue: chr "P" "P" "S" "S"
$ plant : int 10 11 1 3
$ stage : int 1 2 1 3
答案 2 :(得分:1)
这与phiver answer类似,但两次使用separate
。请注意,我们可以在sep
参数中指定位置索引。
library(tidyr)
dat2 <- dat %>%
separate(sample, into = c("tissue", "number"), sep = 1, remove = FALSE) %>%
separate(number, into = c("plant", "stage"), sep = "\\.", remove = TRUE, convert = TRUE)
dat2
# sample tissue plant stage
# 1 P10.1 P 10 1
# 2 P11.2 P 11 2
# 3 S1.1 S 1 1
# 4 S3.3 S 3 3
数据强>
dat <- read.table(text = "sample
P10.1
P11.2
S1.1
S3.3",
header = TRUE, stringsAsFactors = FALSE)