我有一个类似于以下内容的数据对象:
> temp2 %>% arrange(date_val) %>% select(date_val,kpi_name,kpi_value)
# Source: spark<?> [?? x 3]
# Ordered by: date_val
date_val kpi_name kpi_value
<dttm> <chr> <dbl>
1 2018-12-04 00:00:00 KPI1 0
2 2018-12-04 00:00:00 KPI2 38
3 2018-12-04 00:01:00 KPI2 55
4 2018-12-04 00:01:00 KPI1 1
5 2018-12-04 00:02:00 KPI2 55
6 2018-12-04 00:02:00 KPI1 1
7 2018-12-04 00:03:00 KPI1 0
8 2018-12-04 00:03:00 KPI2 58
9 2018-12-04 00:04:00 KPI2 45
10 2018-12-04 00:04:00 KPI1 1
# ⦠with more rows
>
我想为每个分组的date_val插入新行,这将在当前对象中可用的kpi_name / kpi_value上为该date_val组执行计算。例如,假设我需要将以下新的KPI3计算为100 *(KPI1 / KPI2),它将提供一个新的数据对象,例如:
# Source: spark<?> [?? x 3]
# Ordered by: date_val
date_val kpi_name kpi_value
<dttm> <chr> <dbl>
1 2018-12-04 00:00:00 KPI1 0
2 2018-12-04 00:00:00 KPI2 38
3 2018-12-04 00:00:00 KPI3 0
4 2018-12-04 00:01:00 KPI2 55
5 2018-12-04 00:01:00 KPI1 1
6 2018-12-04 00:01:00 KPI3 0.018
7 2018-12-04 00:02:00 KPI2 55
8 2018-12-04 00:02:00 KPI1 1
9 2018-12-04 00:02:00 KPI3 0.018
10 2018-12-04 00:03:00 KPI1 0
11 2018-12-04 00:03:00 KPI2 58
12 2018-12-04 00:03:00 KPI3 0
13 2018-12-04 00:04:00 KPI2 45
14 2018-12-04 00:04:00 KPI1 1
15 2018-12-04 00:04:00 KPI3 0.022
# ⦠with more rows
这可以在DPLYR中完成吗?
答案 0 :(得分:1)
这应该做到:
library(tidyverse)
temp2 %>% spread(kpi_name, kpi_value) %>%
mutate(KPI3 = 100*(KPI1/KPI2)) %>%
gather(kpi_name, kpi_value, -date_val)
虽然在新行中rbind
在技术上是可行的,但它的效率相对较低且语法笨拙。转换为逻辑宽格式,添加列,然后转换回去更有意义。