我有一个像这样的巨大数据框
scan_id sample
1 s10w_00001 1
2 s10w_00002 2
3 s10w_00003 3
4 s10w_00004 4
5 s11d_00001 5
6 s11d_00002 6
7 s11d_00003 7
8 s11d_00004 8
9 s11w_00001 9
10 s11w_00002 10
11 s11w_00003 11
我想添加另一列称为size的列。但是,该列应与scan_id列相对应,其中所有结尾为00001的scan_id都应为大,00002 =中,00003 =小00004 =极小。
数据框应如下所示
scan_id sample sixe
1 s10w_00001 1 big
2 s10w_00002 2 medium
3 s10w_00003 3 small
4 s10w_00004 4 extra small
5 s11d_00001 5 big
6 s11d_00002 6 medium
7 s11d_00003 7 small
8 s11d_00004 8 extra small
9 s11w_00001 9 big
10 s11w_00002 10 medium
11 s11w_00003 11 small
我该怎么做?
答案 0 :(得分:2)
这对您有用吗?
library(tidyverse)
df %>%
separate(col = scan_id, into = c("scan", "id"), sep = "_") %>%
mutate(size = case_when(id == "00001" ~ "big",
id == "00002" ~ "medium",
TRUE ~ "small")) %>%
unite(col = "scan_id", c("scan", "id"), sep = "_")
答案 1 :(得分:0)
这是一个解决方案。请注意,当scan_id既不是00001也不是00002时,大小将为“小”:
library(dplyr)
df_clean <- df %>%
mutate(size = ifelse(grepl("00001", scan_id), "big", "small")) %>%
mutate(size = ifelse(grepl("00002", scan_id), "medium", size)) %>%
mutate(size = ifelse(grepl("00003", scan_id), "small", size))
> df_clean
scan_id sample size
1 s10w_00001 1 big
2 s10w_00002 2 medium
3 s10w_00003 3 small
4 s10w_00004 4 extra small
5 s11d_00001 5 big
6 s11d_00002 6 medium
7 s11d_00003 7 small
8 s11d_00004 8 extra small
9 s11w_00001 9 big
10 s11w_00002 10 medium
11 s11w_00003 11 small
通常应使用dput
提供数据,该数据会将data.frame转换为易于阅读的文本。这是我使用的数据:
df <- read.table(text =
"scan_id sample
1 s10w_00001 1
2 s10w_00002 2
3 s10w_00003 3
4 s10w_00004 4
5 s11d_00001 5
6 s11d_00002 6
7 s11d_00003 7
8 s11d_00004 8
9 s11w_00001 9
10 s11w_00002 10
11 s11w_00003 11", header = TRUE)