Question

我有数据框df1：

A|A2|A3
-+--+--
a|ut|x
a|tv|y
a|ut|x
a|pq|y
a|ut|y
b|st|x
b|qp|x
b|nt|y
c|st|x
c|st|x
c|st|y
c|st|z

我想知道每个A2的{{1}}的唯一A3频率，即我想要跟随输出：

我试过

A|A2|freq
-+--+----
a|ut|2
a|tv|1
a|pq|1
b|st|1
b|qp|1
b|nt|1
c|st|3

但得到错误

count(df1, A, A2, A3)

Answer 1

使用dplyr，您可以使用distinct删除重复的行，然后使用count进行汇总：

library(dplyr)

df1 <- data_frame(A = c("a", "a", "a", "a", "a", "b", "b", "b", "c", "c", "c", "c"), 
                  A2 = c("ut", "tv", "ut", "pq", "ut", "st", "qp", "nt", "st", "st", "st", "st"), 
                  A3 = c("x", "y", "x", "y", "y", "x", "x", "y", "x", "x", "y", "z"))

df2 <- df1 %>% distinct() %>% count(A, A2)

df2
#> # A tibble: 7 x 3
#>       A    A2     n
#>   <chr> <chr> <int>
#> 1     a    pq     1
#> 2     a    tv     1
#> 3     a    ut     2
#> 4     b    nt     1
#> 5     b    qp     1
#> 6     b    st     1
#> 7     c    st     3

或更普遍地，使用n_distinct：

df1 %>% group_by(A, A2) %>% summarise(freq = n_distinct(A3))

Answer 2

以下是data.table

的选项

library(data.table)
setDT(df1)[, .(freq = uniqueN(A3)), .(A, A2)]
#   A A2 freq
#1: a ut    2
#2: a tv    1
#3: a pq    1
#4: b st    1
#5: b qp    1
#6: b nt    1
#7: c st    3

数据

df1 <- structure(list(A = c("a", "a", "a", "a", "a", "b", "b", "b", 
"c", "c", "c", "c"), A2 = c("ut", "tv", "ut", "pq", "ut", "st", 
 "qp", "nt", "st", "st", "st", "st"), A3 = c("x", "y", "x", "y", 
"y", "x", "x", "y", "x", "x", "y", "z")), .Names = c("A", "A2", 
"A3"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-12L))

Answer 3

您可以使用aggregate：

> aggregate(A3 ~ A+A2, data=df1, FUN=function(x) length(unique(x)))
  A A2 A3
1 b nt  1
2 a pq  1
3 b qp  1
4 b st  1
5 c st  3
6 a tv  1
7 a ut  2

在datafrme中查找第三列的唯一值的每个类别的计数

3 个答案:

数据