r中的子集数据包括group_by函数

时间:2017-11-07 22:54:39

标签: r loops subset

这是对here

以下问题的跟进问题

我有以下数据

数据:

df = structure(list(Org_ID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L), 
    Market_volume = c(100L, 200L, 300L, 50L, 500L, 400L, 200L, 
    300L, 100L), Indicator_variable = c(1L, 0L, 0L, 1L, 1L, 0L, 
    0L, 0L, 0L),variable3=c(10L, 1L, 1L, 4L, 2L, 3L, 3L, 10L, 3L),variable4=c(2L, 1L, 1L, 7L, 2L, 3L, 3L, 8L, 3L)).Names = c("Org_ID", "Market_volume", "Indicator_variable","Var3","Var4"
), class = "data.frame", row.names = c(NA, -9L))

使用(dplyr),我通过以下函数按市场交易量按市场交易量计算了NA的百分比

df %>%
  group_by(Org_ID) %>%
  summarize(sum_market_vol = sum(Market_volume*!Indicator_variable),
            tot_market_vol = sum(Market_volume)) %>%
  transmute(Org_ID, Perc_Market_Vol = 100*sum_market_vol/tot_market_vol)

结果:

# A tibble: 3 x 2
  Org_ID Perc_Market_Vol
   <int>           <dbl>
1      1        83.33333
2      2         0.00000
3      3       100.00000

问题: 我希望通过删除Org_ID的所有行(比如说2)#X if perc_market_vol&lt; 30来对原始数据进行子集化。那就是我不想删除相同org_id的各个行,但是整个Org_id,比如Org_id = 1或org_id = 2的所有计数。如何将它连接到两个表或函数的子集?

我希望新数据看起来像这样:

df1 = structure(list(Org_ID = c(1L, 1L, 1L, 3L, 3L, 3L, 3L), 
    Market_volume = c(100L, 200L, 300L, 400L, 200L, 
    300L, 100L), Indicator_variable = c(1L, 0L, 0L, 0L, 
    0L, 0L, 0L),variable3=c(10L, 1L, 1L, 3L, 3L, 10L, 3L),variable4=c(2L, 1L, 1L, 3L, 3L, 8L, 3L)).Names = c("Org_ID", "Market_volume", "Indicator_variable","Var3","Var4"
), class = "data.frame", row.names = c(NA, -7L))

1 个答案:

答案 0 :(得分:0)

您可以使用group_by %>% filter过滤而无需实现汇总数据框,并且在过滤器中您可以计算每组的汇总条件:

df %>% 
    group_by(Org_ID) %>% 
    filter(sum(Market_volume * !Indicator_variable)/sum(Market_volume) > 0.3)

# A tibble: 7 x 5
# Groups:   Org_ID [2]
#  Org_ID Market_volume Indicator_variable  Var3  Var4
#   <int>         <int>              <int> <int> <int>
#1      1           100                  1    10     2
#2      1           200                  0     1     1
#3      1           300                  0     1     1
#4      3           400                  0     3     3
#5      3           200                  0     3     3
#6      3           300                  0    10     8
#7      3           100                  0     3     3