对于以下示例数据框,我需要查找每个id
- 每列的不同值的计数
df <- data.frame(id = c(2,2,3,3,3,1,1,4,4),
prop1 = c("A","A","B","B","B","B","B","B","C"),
prop2 = c(FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,FALSE),
prop3= c(4,4,3,3,4,5,1,5,1))
> df
id prop1 prop2 prop3
1 2 A FALSE 4
2 2 A FALSE 4
3 3 B FALSE 3
4 3 B FALSE 3
5 3 B FALSE 4
6 1 B TRUE 5
7 1 B FALSE 1
8 4 B TRUE 5
9 4 C FALSE 1
基础R首选。
预期输出格式:
> dfDistinctCountByProp
id prop1.unq.cnt prop2.unq.cnt prop3.unq.cnt
1 1 1 2 2
2 2 1 1 1
3 3 1 1 2
4 4 2 2 2
答案 0 :(得分:2)
您可sum
{ - 1}} duplicated
个aggregate
个案例,id
允许您按aggregate(. ~ id, df, function(x){ sum(!duplicated(x)) })
## id prop1 prop2 prop3
## 1 1 1 2 2
## 2 2 1 1 1
## 3 3 1 1 2
## 4 4 2 2 2
进行分组:
length(unique(...))
如果对您更有意义,请使用aggregate(. ~ id, df, function(x){length(unique(x))}) # returns identical result
:
library(dplyr)
df %>% group_by(id) %>% summarise_all(n_distinct)
如果读者关心,在dplyr中它会是
library(data.table)
setDT(df)[, lapply(.SD, uniqueN), by = id]
或data.table,
Validator::extend('extension', function ($attribute, $file, $extensions, $validator) {
$ext = strtolower(@$file->getClientOriginalExtension());
return in_array($ext, $extensions);
});