Question

对于以下示例数据框，我需要查找每个id - 每列的不同值的计数

df <- data.frame(id = c(2,2,3,3,3,1,1,4,4),
                         prop1 = c("A","A","B","B","B","B","B","B","C"),
                         prop2 = c(FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,FALSE),
                         prop3= c(4,4,3,3,4,5,1,5,1))
    > df
      id prop1 prop2 prop3
    1  2     A FALSE     4
    2  2     A FALSE     4
    3  3     B FALSE     3
    4  3     B FALSE     3
    5  3     B FALSE     4
    6  1     B  TRUE     5
    7  1     B FALSE     1
    8  4     B  TRUE     5
    9  4     C FALSE     1

基础R首选。

预期输出格式：

    > dfDistinctCountByProp
      id prop1.unq.cnt prop2.unq.cnt prop3.unq.cnt
    1  1        1               2         2
    2  2        1               1         1
    3  3        1               1         2
    4  4        2               2         2

Answer 1

您可sum { - 1}} duplicated个aggregate个案例，id允许您按aggregate(. ~ id, df, function(x){ sum(!duplicated(x)) }) ## id prop1 prop2 prop3 ## 1 1 1 2 2 ## 2 2 1 1 1 ## 3 3 1 1 2 ## 4 4 2 2 2进行分组：

length(unique(...))

如果对您更有意义，请使用aggregate(. ~ id, df, function(x){length(unique(x))}) # returns identical result：

library(dplyr)

df %>% group_by(id) %>% summarise_all(n_distinct)

如果读者关心，在dplyr中它会是

library(data.table)

setDT(df)[, lapply(.SD, uniqueN), by = id]

或data.table，

Validator::extend('extension', function ($attribute, $file, $extensions, $validator) {
    $ext = strtolower(@$file->getClientOriginalExtension());

    return in_array($ext, $extensions);
});

dataframe - 为一列的相同值的行查找列的不同值

1 个答案: