我有一个每日级别的数据集,我正在尝试为每月汇总无价值的行创建一个标志。
例如
您将如何编码,以便contract_flag对此进行评估?
date contract budget_case actual_case contract_flag
1 2017-01-01 F123 200 100 1
2 2017-01-02 F123 200 100 1
3 2017-01-03 F123 200 100 1
4 2017-01-04 F123 200 100 1
5 2017-01-05 F123 200 100 1
6 2017-01-06 F123 200 100 1
7 2017-01-29 F123 200 100 1
8 2017-01-30 F123 200 100 1
9 2017-01-01 K456 0 0 0
10 2017-01-02 K456 0 0 0
11 2017-01-03 K456 0 0 0
12 2017-01-04 K456 0 0 0
13 2017-01-05 K456 0 0 0
14 2017-01-06 K456 0 0 0
15 2017-01-29 K456 0 0 0
16 2017-01-30 K456 0 0 0
答案 0 :(得分:0)
我们可以使用format.Date
提取年份和月份并将其用作group_by
变量:
library(dplyr)
df %>%
group_by(Month = format.Date(date, format="%Y-%m")) %>%
mutate(contract_flag = ifelse(sum(budget_case) > 0 & sum(actual_case) > 0, 1, 0))
或使用case_when
:
df %>%
group_by(Month = format.Date(date, format="%Y-%m")) %>%
mutate(contract_flag = case_when(sum(budget_case) > 0 & sum(actual_case) > 0 ~ 1, TRUE ~ 0))
结果:
# A tibble: 16 x 6
# Groups: Month [3]
date contract budget_case actual_case contract_flag Month
<chr> <chr> <int> <int> <dbl> <chr>
1 2017-01-01 F123 200 100 1 2017-01
2 2017-01-02 F123 200 100 1 2017-01
3 2017-01-03 F123 200 100 1 2017-01
4 2017-01-04 F123 200 100 1 2017-01
5 2017-01-05 F123 200 100 1 2017-01
6 2017-01-06 F123 200 100 1 2017-01
7 2017-01-29 F123 200 100 1 2017-01
8 2017-01-30 F123 200 100 1 2017-01
9 2017-01-01 K456 0 0 1 2017-01
10 2017-01-02 K456 0 0 1 2017-01
11 2017-01-03 K456 0 0 1 2017-01
12 2017-01-04 K456 0 0 1 2017-01
13 2017-01-05 K456 0 0 1 2017-01
14 2017-02-06 K456 0 100 1 2017-02
15 2017-02-28 K456 200 0 1 2017-02
16 2017-03-30 K456 0 0 0 2017-03
数据(已为更多月份示例修改了日期列):
df <- structure(list(date = c("2017-01-01", "2017-01-02", "2017-01-03",
"2017-01-04", "2017-01-05", "2017-01-06", "2017-01-29", "2017-01-30",
"2017-01-01", "2017-01-02", "2017-01-03", "2017-01-04", "2017-01-05",
"2017-02-06", "2017-02-28", "2017-03-30"), contract = c("F123",
"F123", "F123", "F123", "F123", "F123", "F123", "F123", "K456",
"K456", "K456", "K456", "K456", "K456", "K456", "K456"), budget_case = c(200L,
200L, 200L, 200L, 200L, 200L, 200L, 200L, 0L, 0L, 0L, 0L, 0L,
0L, 200L, 0L), actual_case = c(100L, 100L, 100L, 100L, 100L,
100L, 100L, 100L, 0L, 0L, 0L, 0L, 0L, 100L, 0L, 0L), contract_flag = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("date",
"contract", "budget_case", "actual_case", "contract_flag"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16"))