Question

我想在下面创建一个基于列的新条件：

if the `str` column only contains `A` then insert `A`
if the `str` column only contains `B` then insert `B`
if the `str` column only contains `A` and `B` then insert `AB`

df<-read.table(text="
ID   str
1    A
1    A
1    AA
1    ABB
2    BA 
2    BB", header=T)

ID   str   simplify_str
1    A        A
1    A        A
1    AA       A
1    ABB      AB
2    BA       AB
2    BB       B

Answer 1

就tidyverse选项而言，您可以将dplyr::case_when与stringr::str_detect一起使用

library(dplyr)
library(stringr)
df %>%
    mutate(simplify_str = case_when(
        str_detect(str, "^A+$") ~ "A",
        str_detect(str, "^B+$") ~ "B",
        TRUE ~ "AB"))
#  ID str simplify_str
#1  1   A            A
#2  1   A            A
#3  1  AA            A
#4  1 ABB           AB
#5  2  BA           AB
#6  2  BB            B

Answer 2

使用data.frame：

As <- grep("A",df$str)
Bs <- grep("B",df$str)
df$simplify_str <- ""
df$simplify_str[As] <- paste0(df$simplify_str[As],"A")
df$simplify_str[Bs] <- paste0(df$simplify_str[Bs],"B")

df
  ID str simplify_str
1  1   A            A
2  1   A            A
3  1  AA            A
4  1 ABB           AB
5  2  BA           AB
6  2  BB            B

Answer 3

R基中的一般解决方案，它拆分字符串并将unique字符按排序方式粘贴在一起。

df$simplify_str <- sapply(strsplit(as.character(df$str), ""), 
                   function(x) paste(unique(sort(x)), collapse = ""))

df
#  ID str simplify_str
#1  1   A            A
#2  1   A            A
#3  1  AA            A
#4  1 ABB           AB
#5  2  BA           AB
#6  2  BB            B

r中字符串的循环和条件

3 个答案: