我有一个称为df的datframe。我想基于,
对a,b和c列中的值进行字符串拆分,并为每列获取唯一元素的列,并对这些唯一元素进行计数,如下结果所示。我们如何在R中完成这项工作?感谢您的帮助。
a <- c("cat, cat, dog", "dog")
b<- c("cat")
c<- c("dog, dog", "cat")
df <- data.frame(position= c("1","2"),a, b, c, stringsAsFactors = F)
我想要的结果:
position a_uniq b_uniq c_uniq a_uniq_counts b_uniq_counts c_uniq_counts
1 cat,dog cat dog 2 1 1
2 dog cat cat 1 1 1
答案 0 :(得分:1)
我为您提出一个使用data.table的解决方案:
unique_counts <- function(str){
return(uniqueN(unlist(strsplit(gsub(" ", "" ,str), ","))))
}
unique_strings <- function(str){
return(paste0(unique(unlist(strsplit(gsub(" ", "" ,str), ","))), collapse=","))
}
a <- c("cat, cat, dog", "dog")
b<- c("cat")
c<- c("dog, dog", "cat")
df <- data.frame(position= c("1","2"),a, b, c, stringsAsFactors = F)
df <- as.data.table(df)
for (i in colnames(df)[2:length(colnames(df))]){
df[ , eval(paste0(i,"_uniq")):=mapply(unique_strings, get(i))]
df[ , eval(paste0(i,"_uniq_counts")):=mapply(unique_counts, get(i))]
df[ , eval(i):=NULL]
}
最好!
答案 1 :(得分:1)
这是tidyverse
的一个选项。使用mutate_at
,在定界符,
处分割字符串,并用uniqueN
library(tidyverse)
df %>%
mutate_at(vars(a:c), funs(uniq_counts = strsplit(., ", ") %>%
map_int(n_distinct)))
# position a b c a_uniq_counts b_uniq_counts c_uniq_counts
#1 1 cat, cat, dog cat dog, dog 2 1 1
#2 2 dog cat cat 1 1 1