我正在处理R数据帧
带列
GROUP_COL | TIME| VALUE
。时间是有序的,值是数字,而col是我想对数据进行分组的分类变量。 我的目标是
GROUP_COL
变量组成的第一组TIME
的顺序value = 0.1 * previous_value + 0.9 * value
为每一行计算每个组中值的加权平均值。如果没有先前的值,请保留该值。WEIGHTED
中。到目前为止,我尝试的是:使用dplyr,我使用lag()创建了一个先前值的向量
weighted_avg_with_previous <- function(.data, lag_weight=0.1) {
# get previous values
lag_val <- lag(.data$VALUE, n = 1L, default = 0, order_by = .data$TIME)
# give each value a weight 0.9 for current value and 0.1 for previous value
weighted = (1 -lag_weight) * .data$VALUE + lag_weight * lag_val
return (weighted)
}
data <- data %>%
group_by(SALES_RESPONSIBILITY, PRODUCT_AREA, CURRENCY, FORECAST_TYPE) %>%
arrange(HORIZON, .by_group=TRUE) %>%
mutate(WEIGHTED_VALUE = weighted_avg_with_previous(0.1))
但是,mutate
语句引发错误。如何使weighted_avg_with_previous
函数在单个组中运行?
示例:
GROUP | TIME| VALUE | WEIGHTED VALUE
_____________________________________
A | 1 | 1 | 1
A | 2 | 2 | 1.9
A | 3 | 3 | 2.9
A | 4 | 4 | 3.9
B | 1 | 3 | 3
B | 2 | 7 | 6.6
B | 3 | -4 | -3.3
...
最好, 朱莉娅
答案 0 :(得分:3)
library(tidyverse)
df <- structure(list(GROUP = c("A", "A", "A", "A", "B", "B", "B"),
TIME = c(1L, 2L, 3L, 4L, 1L, 2L, 3L), VALUE = c(1L, 2L, 3L,
4L, 3L, 7L, -4L)), row.names = c(NA, -7L), class = c("tbl_df",
"tbl", "data.frame"))
df %>%
group_by(GROUP) %>%
mutate(previous.value = lag(VALUE)) %>%
mutate(weighted.value = ifelse(is.na(previous.value),VALUE, 0.1*previous.value + 0.9*VALUE)) %>%
select(-previous.value)
第一个mutate()
语句为滞后的value
创建一个新变量,第二个语句创建的weighted.value
等于0.1*previous.value + 0.9*value
或value
(如果{{ 1}}为空。
输出:
previous.value