如何在表的每个单元格中进行字符串拆分后获取唯一值和计数

时间:2019-01-23 18:55:21

标签: r stringr

我有一个称为df的datframe。我想基于,对a,b和c列中的值进行字符串拆分,并为每列获取唯一元素的列,并对这些唯一元素进行计数,如下结果所示。我们如何在R中完成这项工作?感谢您的帮助。

a <- c("cat, cat, dog", "dog")
b<- c("cat")
c<- c("dog, dog", "cat")

df <- data.frame(position= c("1","2"),a, b, c, stringsAsFactors = F)

我想要的结果:

position    a_uniq  b_uniq  c_uniq  a_uniq_counts   b_uniq_counts   c_uniq_counts
1           cat,dog cat     dog     2               1               1
2           dog     cat     cat     1               1               1

2 个答案:

答案 0 :(得分:1)

我为您提出一个使用data.table的解决方案:

unique_counts <- function(str){
return(uniqueN(unlist(strsplit(gsub(" ", "" ,str), ","))))
}

unique_strings <- function(str){
  return(paste0(unique(unlist(strsplit(gsub(" ", "" ,str), ","))), collapse=","))
}

a <- c("cat, cat, dog", "dog")
b<- c("cat")
c<- c("dog, dog", "cat")


df <- data.frame(position= c("1","2"),a, b, c, stringsAsFactors = F)
df <- as.data.table(df)
for (i in colnames(df)[2:length(colnames(df))]){
  df[ , eval(paste0(i,"_uniq")):=mapply(unique_strings, get(i))]
  df[ , eval(paste0(i,"_uniq_counts")):=mapply(unique_counts, get(i))]
  df[ , eval(i):=NULL]
}

最好!

答案 1 :(得分:1)

这是tidyverse的一个选项。使用mutate_at,在定界符,处分割字符串,并用uniqueN

获得唯一计数
library(tidyverse)
df %>% 
      mutate_at(vars(a:c), funs(uniq_counts = strsplit(., ", ") %>%
                  map_int(n_distinct)))
#  position             a   b        c a_uniq_counts b_uniq_counts c_uniq_counts
#1        1 cat, cat, dog cat dog, dog             2             1             1
#2        2           dog cat      cat             1             1             1