如何做多个gsub和变异

时间:2019-03-02 21:39:57

标签: r

我有一个像这样的巨大数据框

      scan_id sample
1  s10w_00001      1
2  s10w_00002      2
3  s10w_00003      3
4  s10w_00004      4
5  s11d_00001      5
6  s11d_00002      6
7  s11d_00003      7
8  s11d_00004      8
9  s11w_00001      9
10 s11w_00002     10
11 s11w_00003     11

我想添加另一列称为size的列。但是,该列应与scan_id列相对应,其中所有结尾为00001的scan_id都应为大,00002 =中,00003 =小00004 =极小。

数据框应如下所示

      scan_id sample sixe
1  s10w_00001      1  big
2  s10w_00002      2 medium
3  s10w_00003      3 small
4  s10w_00004      4 extra small
5  s11d_00001      5 big
6  s11d_00002      6 medium
7  s11d_00003      7 small
8  s11d_00004      8 extra small
9  s11w_00001      9 big 
10 s11w_00002     10 medium 
11 s11w_00003     11 small

我该怎么做?

2 个答案:

答案 0 :(得分:2)

这对您有用吗?

library(tidyverse)

df %>%
  separate(col = scan_id, into = c("scan", "id"), sep = "_") %>%
  mutate(size = case_when(id == "00001" ~ "big",
                          id == "00002" ~ "medium",
                          TRUE          ~ "small")) %>%
  unite(col = "scan_id", c("scan", "id"), sep = "_")

答案 1 :(得分:0)

这是一个解决方案。请注意,当scan_id既不是00001也不是00002时,大小将为“小”:

library(dplyr)
df_clean <- df %>% 
  mutate(size = ifelse(grepl("00001", scan_id), "big", "small")) %>% 
  mutate(size = ifelse(grepl("00002", scan_id), "medium", size)) %>%   

mutate(size = ifelse(grepl("00003", scan_id), "small", size))
> df_clean
      scan_id sample        size
1  s10w_00001      1         big
2  s10w_00002      2      medium
3  s10w_00003      3       small
4  s10w_00004      4 extra small
5  s11d_00001      5         big
6  s11d_00002      6      medium
7  s11d_00003      7       small
8  s11d_00004      8 extra small
9  s11w_00001      9         big
10 s11w_00002     10      medium
11 s11w_00003     11       small

数据

通常应使用dput提供数据,该数据会将data.frame转换为易于阅读的文本。这是我使用的数据:

df <- read.table(text =
  "scan_id sample
  1  s10w_00001      1
  2  s10w_00002      2
  3  s10w_00003      3
  4  s10w_00004      4
  5  s11d_00001      5
  6  s11d_00002      6
  7  s11d_00003      7
  8  s11d_00004      8
  9  s11w_00001      9
  10 s11w_00002     10
  11 s11w_00003     11", header = TRUE)