我想在下面创建一个基于列的新条件:
if the `str` column only contains `A` then insert `A`
if the `str` column only contains `B` then insert `B`
if the `str` column only contains `A` and `B` then insert `AB`
df<-read.table(text="
ID str
1 A
1 A
1 AA
1 ABB
2 BA
2 BB", header=T)
ID str simplify_str
1 A A
1 A A
1 AA A
1 ABB AB
2 BA AB
2 BB B
答案 0 :(得分:3)
就tidyverse
选项而言,您可以将dplyr::case_when
与stringr::str_detect
一起使用
library(dplyr)
library(stringr)
df %>%
mutate(simplify_str = case_when(
str_detect(str, "^A+$") ~ "A",
str_detect(str, "^B+$") ~ "B",
TRUE ~ "AB"))
# ID str simplify_str
#1 1 A A
#2 1 A A
#3 1 AA A
#4 1 ABB AB
#5 2 BA AB
#6 2 BB B
答案 1 :(得分:2)
使用data.frame:
As <- grep("A",df$str)
Bs <- grep("B",df$str)
df$simplify_str <- ""
df$simplify_str[As] <- paste0(df$simplify_str[As],"A")
df$simplify_str[Bs] <- paste0(df$simplify_str[Bs],"B")
df
ID str simplify_str
1 1 A A
2 1 A A
3 1 AA A
4 1 ABB AB
5 2 BA AB
6 2 BB B
答案 2 :(得分:2)
R基中的一般解决方案,它拆分字符串并将unique
字符按排序方式粘贴在一起。
df$simplify_str <- sapply(strsplit(as.character(df$str), ""),
function(x) paste(unique(sort(x)), collapse = ""))
df
# ID str simplify_str
#1 1 A A
#2 1 A A
#3 1 AA A
#4 1 ABB AB
#5 2 BA AB
#6 2 BB B