为两个变量的比率添加一行

时间:2018-02-01 05:21:05

标签: r dplyr plyr

对于每个DVIDFORM,我想在我的数据框中为美联储/禁食比率添加一行

dfin >- 
DVID   FORM   FED    median   gmean    CV
 1      A     fast    15       20      10
 1      A     Fed     30       40      15 
 1      B     fast    40       60      20
 1      B     Fed     50       100     25

mydfout <- 
DVID   FORM   FED          median   gmean     CV
 1      A     fast           15       20      10
 1      A     Fed            30       40      15
 1      A     Fed/Fasted(%)  200      200     NA
 1      B     fast           40       60      20
 1      B     Fed            50       100     25
 1      B     Fed/Fasted(%)  125      166.6   NA

我怎么能在R?中做到这一点?

3 个答案:

答案 0 :(得分:6)

我们可以使用base r函数来执行此操作:

A=aggregate(cbind(median,gmean)~DVID+FORM,dat1,function(x)x[2]/x[1]*100)
B=transform(A,FED="Fed/Fasted%",CV=NA)
do.call(rbind,Map(rbind,split(dat1,dat1[1:2]),split(B,B[1:2])))
      DVID FORM         FED median    gmean CV
1.A.1    1    A        fast     15  20.0000 10
1.A.2    1    A         Fed     30  40.0000 15
1.A.3    1    A Fed/Fasted%    200 200.0000 NA
1.B.3    1    B        fast     40  60.0000 20
1.B.4    1    B         Fed     50 100.0000 25
1.B.2    1    B Fed/Fasted%    125 166.6667 NA

答案 1 :(得分:2)

一种简单的方法是计算所有聚合,然后将它们行绑定回原始数据框。在dplyr中,

library(dplyr)

df_in <- data.frame(DVID = c(1L, 1L, 1L, 1L), 
                 FORM = c("A", "A", "B", "B"), 
                 FED = c("fast", "Fed", "fast", "Fed"), 
                 median = c(15L, 30L, 40L, 50L), 
                 gmean = c(20L, 40L, 60L, 100L), 
                 CV = c(10L, 15L, 20L, 25L),
                 stringsAsFactors = FALSE)

df_out <- df_in %>% 
    group_by(DVID, FORM) %>% 
    summarise_at(vars(median, gmean), 
                 funs(.[FED == 'Fed'] / .[FED == 'fast'] * 100)) %>% 
    mutate(FED = 'Fed/Fasted(%)', 
           CV = NA) %>% 
    bind_rows(df_in) %>% 
    select(1:2, 5, 3:4, 6) %>% arrange(DVID, FORM, FED) %>% ungroup()    # make it pretty

df_out
#> # A tibble: 6 x 6
#>    DVID FORM  FED           median gmean    CV
#>   <int> <chr> <chr>          <dbl> <dbl> <int>
#> 1     1 A     fast            15.0  20.0    10
#> 2     1 A     Fed             30.0  40.0    15
#> 3     1 A     Fed/Fasted(%)  200   200      NA
#> 4     1 B     fast            40.0  60.0    20
#> 5     1 B     Fed             50.0 100      25
#> 6     1 B     Fed/Fasted(%)  125   167      NA

答案 2 :(得分:1)

使用基数R的一种方法是split数据框DVIDFORM,我们计算mediangmean。从DVIDFORM的组中获取第一个值,并将NA分配给CV

do.call(rbind, 
    lapply(split(dfin, list(dfin$DVID, dfin$FORM)), function(x) 
    rbind(x, data.frame(DVID = x[[1]][1], FORM = x[[2]][1], FED = "Fed/Fasted(%)",
    median = (x[["median"]][x[["FED"]] == "Fed"]/x[["median"]][x[["FED"]] == "fast"]) * 100, 
    gmean = (x[["gmean"]][x[["FED"]] == "Fed"]/x[["gmean"]][x[["FED"]] == "fast"]) * 100, 
    CV = NA))))



#DVID FORM           FED median    gmean CV
#   1    A          fast     15  20.0000 10
#   1    A           Fed     30  40.0000 15
#   1    A Fed/Fasted(%)    200 200.0000 NA
#   1    B          fast     40  60.0000 20
#   1    B           Fed     50 100.0000 25
#   1    B Fed/Fasted(%)    125 166.6667 NA