如何计算r中的索引?

时间:2019-04-26 00:32:26

标签: r indexing group-by row

这是我第一次在这个伟大的社区中提问。我正在尝试在data.frame上计算索引,并按自治市镇或街区和地块显示它们。哪个代码最适合?

这是我拥有的数据集的一个示例。 albo,aegy =蚊种,房屋=预期房屋,计算出的房屋指数为(阳性房屋数量/预期房屋数量)* 100。阳性房屋是指发现至少一个蚊子的房屋(值!= 0)HI =(7/11)* 100 =总计63.63(11 =预期房屋数量,而7 =阳性房屋总数)


borough neighborhood    concession  albo    aegyp   Total_albo_aegyp
a1  mendong                1         1        5            6
a1  mendong                2         5        2            7
a1  mendong                3         2        1            3
a1  tam tam                4         0        0            0
a2  tam tam                5         4        6            10
a2  obili                  6         0        1             1
a2  obili                  7         0        0             0
a3  acacia                 8         3        7             10
a4  melen                  9         1        1             2
a4  melen                  10        0        5             5
a4  polytech               11        8        0             10

HIcommune <- concessiondata %>% 
  group_by(commune) %>% 
  summarise(
  Mean = mean(concessiondata$total_aedes_albo_aegypti!=0),
  HIY = sum(concessiondata1$total_aedes_albo_aegypti!=0)/length(concessiondata1$total_aedes_albo_aegypti))

  Houseindex_total <- concessiondata1[, Mean := mean(total_aedes_albo_aegypti!=0), by = "commune"]


  ## This is how the results should look like

borough albo HI aegy HI Total_albo_aegyp_HI
a1        75       75         75
a2        33.33    66.66      66.66
a3        100      100        100
a4        66.66    66.66      100

1 个答案:

答案 0 :(得分:1)

首先,您的代码存在几个一般的编码/语法问题。

  1. 我建议不要混合使用dplyrdata.table语法。
  2. 您不需要为$动词内的列dplyr编制索引。

我建议您熟悉许多免费提供的tidyverse教程之一,以学习使用dplyr / tidyr重塑/操作数据的基础。

此外,以下内容再现了您的预期输出

calc_index <- function(x) sum(x != 0) / length(x) * 100

library(dplyr)
df %>%
    group_by(borough) %>%
    summarise(
        albo_HI = calc_index(albo),
        aegyp_HI = calc_index(aegyp),
        Total_albo_aegyp = calc_index(Total_albo_aegyp))
## A tibble: 4 x 4
#  borough albo_HI aegyp_HI Total_albo_aegyp
#  <fct>     <dbl>    <dbl>            <dbl>
#1 a1         75       75               75
#2 a2         33.3     66.7             66.7
#3 a3        100      100              100
#4 a4         66.7     66.7            100

或者您可以使用summarise_all

df %>%
    group_by(borough) %>%
    select(-neighborhood, -concession) %>%
    summarise_all(~calc_index(.x))
## A tibble: 4 x 4
#  borough  albo aegyp Total_albo_aegyp
#  <fct>   <dbl> <dbl>            <dbl>
#1 a1       75    75               75
#2 a2       33.3  66.7             66.7
#3 a3      100   100              100
#4 a4       66.7  66.7            100