这是我第一次在这个伟大的社区中提问。我正在尝试在data.frame上计算索引,并按自治市镇或街区和地块显示它们。哪个代码最适合?
这是我拥有的数据集的一个示例。 albo,aegy =蚊种,房屋=预期房屋,计算出的房屋指数为(阳性房屋数量/预期房屋数量)* 100。阳性房屋是指发现至少一个蚊子的房屋(值!= 0)HI =(7/11)* 100 =总计63.63(11 =预期房屋数量,而7 =阳性房屋总数)
borough neighborhood concession albo aegyp Total_albo_aegyp
a1 mendong 1 1 5 6
a1 mendong 2 5 2 7
a1 mendong 3 2 1 3
a1 tam tam 4 0 0 0
a2 tam tam 5 4 6 10
a2 obili 6 0 1 1
a2 obili 7 0 0 0
a3 acacia 8 3 7 10
a4 melen 9 1 1 2
a4 melen 10 0 5 5
a4 polytech 11 8 0 10
HIcommune <- concessiondata %>%
group_by(commune) %>%
summarise(
Mean = mean(concessiondata$total_aedes_albo_aegypti!=0),
HIY = sum(concessiondata1$total_aedes_albo_aegypti!=0)/length(concessiondata1$total_aedes_albo_aegypti))
Houseindex_total <- concessiondata1[, Mean := mean(total_aedes_albo_aegypti!=0), by = "commune"]
## This is how the results should look like
borough albo HI aegy HI Total_albo_aegyp_HI
a1 75 75 75
a2 33.33 66.66 66.66
a3 100 100 100
a4 66.66 66.66 100
答案 0 :(得分:1)
首先,您的代码存在几个一般的编码/语法问题。
dplyr
和data.table
语法。$
动词内的列dplyr
编制索引。我建议您熟悉许多免费提供的tidyverse
教程之一,以学习使用dplyr
/ tidyr
重塑/操作数据的基础。
此外,以下内容再现了您的预期输出
calc_index <- function(x) sum(x != 0) / length(x) * 100
library(dplyr)
df %>%
group_by(borough) %>%
summarise(
albo_HI = calc_index(albo),
aegyp_HI = calc_index(aegyp),
Total_albo_aegyp = calc_index(Total_albo_aegyp))
## A tibble: 4 x 4
# borough albo_HI aegyp_HI Total_albo_aegyp
# <fct> <dbl> <dbl> <dbl>
#1 a1 75 75 75
#2 a2 33.3 66.7 66.7
#3 a3 100 100 100
#4 a4 66.7 66.7 100
或者您可以使用summarise_all
df %>%
group_by(borough) %>%
select(-neighborhood, -concession) %>%
summarise_all(~calc_index(.x))
## A tibble: 4 x 4
# borough albo aegyp Total_albo_aegyp
# <fct> <dbl> <dbl> <dbl>
#1 a1 75 75 75
#2 a2 33.3 66.7 66.7
#3 a3 100 100 100
#4 a4 66.7 66.7 100