我有一个这样的数据框:
df <- data.frame(
SchoolID=c("A","A","B","B","C","D"),
Country=c("XX","XX","XX","YY","ZZ","ZZ"))
给了我这个数据:
SchoolID Country
1 A XX
2 A XX
3 B XX
4 B YY
5 C ZZ
6 D ZZ
我想知道每个SchoolID是否通过查找SchoolID的每个不同值,Country的唯一值的数量来唯一地分配Country。所以我想获得这样的表格:
SchoolID NumberOfCountry
A 1
B 2
C 1
D 1
答案 0 :(得分:3)
aggregate(Country ~ SchoolID, df, function(x) length(unique(x)))
或者
tapply(df$Country, df$SchoolID, function(x) length(unique(x)))
或者
library(data.table)
setDT(df)[, .(NumberOfCountry = length(unique(Country))), by = SchoolID]
setDT(df)[, .(NumberOfCountry = uniqueN(Country)), by = SchoolID]
或者
library(dplyr)
df %>%
group_by(SchoolID) %>%
summarise(NumberOfCountry = n_distinct(Country))
答案 1 :(得分:1)
一种不依赖第三方库的方法:
> as.data.frame(rowSums(table(df[!duplicated(df), ]), na.rm=T))
rowSums(table(df[!duplicated(df), ]), na.rm = T)
A 1
B 2
C 1
D 1
答案 2 :(得分:-1)
试试这个..
select School,count(Country)
from(
select distinct School,Country
from tbl_stacko) temp
group by School