(此论坛上的其他问题和答案elsewhere似乎无法处理此Feed中提到的跨境问题)
假设我有以下数据:
df <- data.frame(id=c("Eric", "John", "Sarah", "Simon", "Abdul", "Charlotte", "Alex", "Susan"),
state=c("CA", "AK", "NY", "NY", "NJ", "GA", "CA", "CA"),
project=c(1, 2, 2, 2, 3, 4, 5, 5), stringsAsFactors = F)
> df
id state project
1 Eric CA 1
2 John AK 2
3 Sarah NY 2
4 Simon NY 2
5 Abdul NJ 3
6 Charlotte GA 4
7 Alex CA 5
8 Susan CA 5
我想获得每个州的平均项目成员数量,也计算跨境成员。
为了获得只有州内成员的平均值,我做了以下几点:
dfx <- data.frame()
dfy <- data.frame()
for(j in unique(df$state)){
h <- subset(df, state==j)
counts <- plyr::count(h, 'project')
#uniques <- length(unique(sub$invje))
average_members <- mean(counts$freq)
dfx <- data.frame(state=j,
average_members=average_members)
dfy <- rbind(dfy, dfx)
}
> dfy
state average_members
1 CA 1.5
2 AK 1.0
3 NY 2.0
4 NJ 1.0
5 GA 1.0
我想要的输出之后,AK和NY都应得3分,因为每个ID与项目中的其他两个ID一起工作(尽管生活在不同的状态)。
> desired
state average_members
1 CA 1.5
2 AK 3.0
3 NY 3.0
4 NJ 1.0
5 GA 1.0
有谁知道如何编码?
答案 0 :(得分:3)
library(data.table)
setDT(df)
df[, .(num_proj = .N), by = .(state, project)][, .(average_members = mean(num_proj)), by = state]
结果:
state average_members
1: CA 1.5
2: AK 1.0
3: NY 2.0
4: NJ 1.0
5: GA 1.0
对于第二种情况,请在第一次迭代中将state
拉出群组。
unique(df[, .(state, num_proj = .N), by = project])[, .(average_members = mean(num_proj)), by = state]
1: CA 1.5
2: AK 3.0
3: NY 3.0
4: NJ 1.0
5: GA 1.0
答案 1 :(得分:2)
您可以使用dplyr
库执行此操作。您可以使用
library(dplyr)
df %>% count(state, project) %>%
group_by(state) %>% summarize(avg=mean(n))
# state avg
# 1 AK 1.0
# 2 CA 1.5
# 3 GA 1.0
# 4 NJ 1.0
# 5 NY 2.0
你可以用
获得跨州结果df %>% distinct(project, state) %>%
inner_join(df %>% count(project)) %>%
group_by(state) %>% summarize(avg=mean(n))
# state avg
# 1 AK 3.0
# 2 CA 1.5
# 3 GA 1.0
# 4 NJ 1.0
# 5 NY 3.0
答案 2 :(得分:1)
df <- data.frame(id=c("Eric", "John", "Sarah", "Simon", "Abdul", "Charlotte", "Alex", "Susan"),
state=c("CA", "AK", "NY", "NY", "NJ", "GA", "CA", "CA"),
project=c(1, 2, 2, 2, 3, 4, 5, 5), stringsAsFactors = F)
dfx <- data.frame()
dfy <- data.frame()
for (j in unique(df$state)) {
h = subset(df, state==j)
thisStatesProjects = unique(h[,"project"])
h2 = subset(df, project %in% thisStatesProjects)
average_members = nrow(h2)/length(thisStatesProjects)
dfx <- data.frame(state=j,
average_members=average_members)
dfy <- rbind(dfy, dfx)
}