如何在dplyr中将一组的值与休息分开

时间:2017-05-29 06:58:38

标签: r

我的数据框如下

function wp_api_encode_acf($data,$post,$context){
    $data['meta'] = array_merge($data['meta'],get_fields($post['ID']));
    return $data;
}

if( function_exists('get_fields') ){
    add_filter('json_prepare_post', 'wp_api_encode_acf', 10, 3);
}

我想要计算的是行业数量百分比的构成。 例如对于基金ABC,IT行业在201704个月的贡献为40 /(40 + 30 + 20 + 50)= 0.28,即28%

所需的数据框应如下所示。

fund_name       Industry     quantity      month
 ABC              IT           20          201704
 ABC              IT           20          201704
 ABC              Industrials  30          201704
 ABC              Auto         40          201704
 ABC              Pharma       50          201704
 DEF              IT           20          201704 
 DEF              Auto         35          201704
 DEF              Auto         35          201704
 DEF              Pharma       40          201704

我在下面试过,但它只给了我数量的总和。

fund_name       Industry       quantity                  month
 ABC              IT           40/(40+30+20+50)          201704
 ABC              Industrials  30/(40+30+20+50)          201704
 ABC              Auto         40/(40+30+20+50)          201704
 ABC              Pharma       50/(40+30+20+50)          201704
 DEF              IT           20/(20+70+40)             201704 
 DEF              Auto         70/(20+70+40)             201704
 DEF              Pharma       40/(20+70+40)             201704

我怎样才能在dplyr中实现这个目标?

2 个答案:

答案 0 :(得分:1)

以下几种方式之一:

df <- read.table(header=TRUE, text="fund_name       Industry     quantity      month
ABC              IT           20          201704
ABC              Industrials  30          201704
ABC              Auto         40          201704
ABC              Pharma       50          201704
DEF              IT           20          201704 
DEF              Auto         35          201704
DEF              Pharma       40          201704")
df

library(dplyr)
want<-select(
  mutate(
    left_join(df,
            df %>%
                  group_by(fund_name) %>%
                  summarize(quantity_sum=sum(quantity)),
                by="fund_name"),
    quantity=quantity/quantity_sum),
  -quantity_sum)
want

答案 1 :(得分:0)

以下R代码让我得到了我想要的东西

industry_composition <- final_reliance_MF %>% 
   group_by(fund_names,Industry,Month) %>% 
   summarise(total_quant = sum(Quantity)) %>% 
   group_by(fund_names,Month) %>% 
   mutate(perc = (total_quant/sum(total_quant))*100) %>% 
   as.data.frame()