基于按类别

时间:2017-02-22 15:23:49

标签: r dplyr aggregate

这可能是基本的,但我已经试图弄清楚好几天了,并且没有找到答案。

我正在尝试根据两列'浓度'来计算新数量。和' area'按流域分组'。我已经编写了一个函数来计算每一行的浓度差异,以及最大面积的行按照该流域面积的比例进行归一化,但是它不能用dplyr或{{ 1}}(。它适用于by,但随后会返回一个列表。

理想情况下,我想在数据框上添加一列或完全替换浓度列。这是数据框' lev':

aggregate

这是功能:

  area catchment concentration
1    1       Yup       2.00000
2   10       Yup      40.50000
3   25       Yup      50.82031
4   35       Yup      50.00000
5    1      Nope       1.00000
6   10      Nope       5.00000
7   25      Nope      40.08333
8   35      Nope      38.00000

这是理想的结果:

lever <- function(data=lev, x=data[,"concentration"], y=data[,"area"]){
N= which.max(y) 
L = (x - x[N]) * y/max(y)
return(L)}

使用 area catchment concentration leverage 1 1 Yup 2.00000 -1.3714286 2 10 Yup 40.50000 -2.7142857 3 25 Yup 50.82031 0.5859375 4 35 Yup 50.00000 0.0000000 5 1 Nope 1.00000 -1.0571429 6 10 Nope 5.00000 -9.4285714 7 25 Nope 40.08333 1.4880952 8 35 Nope 38.00000 0.0000000 ,我可以获得两个列表,其中包含每个流域的结果:

by

但是我想在多个因素分类的多列上使用该功能(例如除集水区之外的日期)我得到了

  

&#39;尺寸数不正确

by(lev, lev$catchment, lever) doBy的错误。

3 个答案:

答案 0 :(得分:1)

我们可以使用tidyverse

library(tidyverse)
df1 %>% 
  group_by(catchment) %>%
  mutate(leverage = (concentration- concentration[which.max(area)]) * area/max(area))

根据说明,如果有多个列作为分组变量,请将其放在group_by中,并且计算也可以应用于mutate_each

的多个列

答案 1 :(得分:1)

加载您的数据:

lev <- read.table(text = "area catchment concentration
    1       Yup       2.00000
   10       Yup      40.50000
   25       Yup      50.82031
   35       Yup      50.00000
    1      Nope       1.00000
   10      Nope       5.00000
   25      Nope      40.08333
   35      Nope      38.00000", 
   header=TRUE)

按集水区划分

library(dplyr)
lev %>% 
    group_by(catchment) %>% 
    mutate(N = which.max(area),
           L = (concentration - concentration[N]) * area/max(area))

# 
#    area catchment concentration     N          L
#   <int>    <fctr>         <dbl> <int>      <dbl>
# 1     1       Yup       2.00000     4 -1.3714286
# 2    10       Yup      40.50000     4 -2.7142857
# 3    25       Yup      50.82031     4  0.5859357
# 4    35       Yup      50.00000     4  0.0000000
# 5     1      Nope       1.00000     4 -1.0571429
# 6    10      Nope       5.00000     4 -9.4285714
# 7    25      Nope      40.08333     4  1.4880929
# 8    35      Nope      38.00000     4  0.0000000

使用您的功能

我修改你的函数,使它返回一个数据框。

lever2 <- function(data, 
                   x = data[,"concentration"][[1]], 
                   y = data[,"area"][[1]]){
    # Use [[1]] to extract the vector only
    N <- which.max(y)
    L <- (x - x[N]) * y/max(y)
    # Put L back into the data frame 
    # so that we keep the concentration and area in the result
    data$L <- L
    return(data)
    }

然后可以将功能与dplyr::group_by %>% do

一起使用
lev %>% 
    group_by(catchment) %>% 
    do( lever2(.))

答案 2 :(得分:1)

您还可以使用data.table来计算此值:

library(data.table)
# convert to data.table
setDT(df)

df[, leverage := (concentration - concentration[which.max(area)]) * (area / max(area)),
   by=catchment]
df
   area catchment concentration   leverage
1:    1       Yup       2.00000 -1.3714286
2:   10       Yup      40.50000 -2.7142857
3:   25       Yup      50.82031  0.5859357
4:   35       Yup      50.00000  0.0000000
5:    1      Nope       1.00000 -1.0571429
6:   10      Nope       5.00000 -9.4285714
7:   25      Nope      40.08333  1.4880929
8:   35      Nope      38.00000  0.0000000

数据

df <-
structure(list(area = c(1L, 10L, 25L, 35L, 1L, 10L, 25L, 35L), 
    catchment = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("Nope", 
    "Yup"), class = "factor"), concentration = c(2, 40.5, 50.82031, 
    50, 1, 5, 40.08333, 38)), .Names = c("area", "catchment", 
"concentration"), class = "data.frame", row.names = c("1", "2", 
"3", "4", "5", "6", "7", "8"))