我正在尝试从看起来像这样的数据表中创建一个计数表:
df <- data.frame("Spring" = c("skirt, pants, shirt", "tshirt"), "Summer" =
c("shorts, skirt", "pants, shoes"), Fall = c("Scarf", "purse, pants"))
Spring Summer Fall
1 skirt, pants, shirt shorts, skirt Scarf
2 tshirt pants, shoes purse, pants
,最后是一个看起来像这样的计数表:
output <- data.frame("Spring" = 4, "Summer" = 4, Fall = 3)
Spring Summer Fall
1 4 4 3
因此,我希望它能为每个季节计算一列中的唯一值。我在此遇到麻烦,因为逗号分隔1个单元格内的值。我尝试使用length(unique())),但是由于列数,它没有给我正确的数字。
感谢您的帮助!
答案 0 :(得分:1)
一种tidyverse
可能是:
df %>%
mutate_if(is.factor, as.character) %>%
gather(var, val) %>%
mutate(val = strsplit(val, ", ")) %>%
unnest() %>%
group_by(var) %>%
summarise(val = n_distinct(val))
var val
<chr> <int>
1 Fall 3
2 Spring 4
3 Summer 4
如果您想完全匹配所需的输出,则可以添加spread()
:
df %>%
mutate_if(is.factor, as.character) %>%
gather(var, val) %>%
mutate(val = strsplit(val, ", ")) %>%
unnest() %>%
group_by(var) %>%
summarise(val = n_distinct(val)) %>%
spread(var, val)
Fall Spring Summer
<int> <int> <int>
1 3 4 4
或者使用@Sonny的基本思想(这只需要dplyr
):
df %>%
mutate_if(is.factor, as.character) %>%
summarise_all(list(~ n_distinct(unlist(strsplit(., ", ")))))
Spring Summer Fall
1 4 4 3
答案 1 :(得分:1)
使用summarise_all
:
getCount <- function(x) {
x <- as.character(x)
length(unique(unlist(strsplit(x, ","))))
}
library(dplyr)
df %>%
summarise_all(funs(getCount))
Spring Summer Fall
1 4 4 3