如何使用R从列的字符串中删除重复的字符? 例如,这是我的专栏:
df<- data.frame(name = c(A="a,a,b,c,d,d,d",
B="a,b,b,b,f",
C="d,d,d,d",
D="a,a"))
我期望的专栏:
df<- data.frame(name = c(A="a,b,c,d",
B="a,b,f",
C="d",
D="a"))
答案 0 :(得分:0)
我们首先使用tidyverse
将行名添加为列,将逗号分隔的字符串分隔成separate_rows
,group_by
rowname
并删除duplicated
值并将其转换为再次使用toString
用逗号分隔字符串。
library(tidyverse)
df %>%
rownames_to_column() %>%
separate_rows(name, sep = ",") %>%
group_by(rowname) %>%
filter(!duplicated(name)) %>%
summarise(name = toString(name)) %>%
column_to_rownames()
# name
#A a, b, c, d
#B a, b, f
#C d
#D a
使用sapply
的Base R方法与@tmfmnk完全相同
sapply(strsplit(as.character(df$name), ","), function(x) toString(unique(x)))
#[1] "a, b, c, d" "a, b, f" "d" "a"
答案 1 :(得分:0)
一种dplyr
可能是:
df %>%
rowwise() %>%
mutate(name = toString(unique(unlist(strsplit(name, ",")))))
name
<chr>
1 a, b, c, d
2 a, b, f
3 d
4 a
与base R
相同:
sapply(df$name, function(x) toString(unique(unlist(strsplit(x, ",")))), USE.NAMES = FALSE)
答案 2 :(得分:0)
带有map
和strsplit
的选项
library(tidyverse)
df %>%
mutate(name = strsplit(as.character(name), ",") %>%
map(~toString(unique(.x))))
# name
#1 a, b, c, d
#2 a, b, f
#3 d
#4 a
或者在base R
中使用正则表达式
sub(",$", "", gsub("([a-z],)\\1+", "\\1", paste0(df$name, ",")))
#[1] "a,b,c,d" "a,b,f" "d" "a"