这可能是基本的,但我已经试图弄清楚好几天了,并且没有找到答案。
我正在尝试根据两列'浓度'来计算新数量。和' area'按流域分组'。我已经编写了一个函数来计算每一行的浓度差异,以及最大面积的行按照该流域面积的比例进行归一化,但是它不能用dplyr
或{{ 1}}(。它适用于by,但随后会返回一个列表。
理想情况下,我想在数据框上添加一列或完全替换浓度列。这是数据框' lev':
aggregate
这是功能:
area catchment concentration
1 1 Yup 2.00000
2 10 Yup 40.50000
3 25 Yup 50.82031
4 35 Yup 50.00000
5 1 Nope 1.00000
6 10 Nope 5.00000
7 25 Nope 40.08333
8 35 Nope 38.00000
这是理想的结果:
lever <- function(data=lev, x=data[,"concentration"], y=data[,"area"]){
N= which.max(y)
L = (x - x[N]) * y/max(y)
return(L)}
使用 area catchment concentration leverage
1 1 Yup 2.00000 -1.3714286
2 10 Yup 40.50000 -2.7142857
3 25 Yup 50.82031 0.5859375
4 35 Yup 50.00000 0.0000000
5 1 Nope 1.00000 -1.0571429
6 10 Nope 5.00000 -9.4285714
7 25 Nope 40.08333 1.4880952
8 35 Nope 38.00000 0.0000000
,我可以获得两个列表,其中包含每个流域的结果:
by
但是我想在多个因素分类的多列上使用该功能(例如除集水区之外的日期)我得到了
&#39;尺寸数不正确
by(lev, lev$catchment, lever)
和doBy
的错误。
答案 0 :(得分:1)
我们可以使用tidyverse
library(tidyverse)
df1 %>%
group_by(catchment) %>%
mutate(leverage = (concentration- concentration[which.max(area)]) * area/max(area))
根据说明,如果有多个列作为分组变量,请将其放在group_by
中,并且计算也可以应用于mutate_each
答案 1 :(得分:1)
加载您的数据:
lev <- read.table(text = "area catchment concentration
1 Yup 2.00000
10 Yup 40.50000
25 Yup 50.82031
35 Yup 50.00000
1 Nope 1.00000
10 Nope 5.00000
25 Nope 40.08333
35 Nope 38.00000",
header=TRUE)
按集水区划分
library(dplyr)
lev %>%
group_by(catchment) %>%
mutate(N = which.max(area),
L = (concentration - concentration[N]) * area/max(area))
#
# area catchment concentration N L
# <int> <fctr> <dbl> <int> <dbl>
# 1 1 Yup 2.00000 4 -1.3714286
# 2 10 Yup 40.50000 4 -2.7142857
# 3 25 Yup 50.82031 4 0.5859357
# 4 35 Yup 50.00000 4 0.0000000
# 5 1 Nope 1.00000 4 -1.0571429
# 6 10 Nope 5.00000 4 -9.4285714
# 7 25 Nope 40.08333 4 1.4880929
# 8 35 Nope 38.00000 4 0.0000000
我修改你的函数,使它返回一个数据框。
lever2 <- function(data,
x = data[,"concentration"][[1]],
y = data[,"area"][[1]]){
# Use [[1]] to extract the vector only
N <- which.max(y)
L <- (x - x[N]) * y/max(y)
# Put L back into the data frame
# so that we keep the concentration and area in the result
data$L <- L
return(data)
}
然后可以将功能与dplyr::group_by %>% do
lev %>%
group_by(catchment) %>%
do( lever2(.))
答案 2 :(得分:1)
您还可以使用data.table
来计算此值:
library(data.table)
# convert to data.table
setDT(df)
df[, leverage := (concentration - concentration[which.max(area)]) * (area / max(area)),
by=catchment]
df
area catchment concentration leverage
1: 1 Yup 2.00000 -1.3714286
2: 10 Yup 40.50000 -2.7142857
3: 25 Yup 50.82031 0.5859357
4: 35 Yup 50.00000 0.0000000
5: 1 Nope 1.00000 -1.0571429
6: 10 Nope 5.00000 -9.4285714
7: 25 Nope 40.08333 1.4880929
8: 35 Nope 38.00000 0.0000000
数据强>
df <-
structure(list(area = c(1L, 10L, 25L, 35L, 1L, 10L, 25L, 35L),
catchment = structure(c(2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("Nope",
"Yup"), class = "factor"), concentration = c(2, 40.5, 50.82031,
50, 1, 5, 40.08333, 38)), .Names = c("area", "catchment",
"concentration"), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7", "8"))