我的数据框'test'如下图所示。
我有2个不同的操作我想在两个不同的列上完成,并且如果可能的话,我希望使用有效的dplyr或purrr方法来解析。
操作#1: 我喜欢将'amt_needed'NA值填充为上面'剩余'的两个值(这是一个测试数据帧,但在实际版本中,Ill有更多行,每次Id都喜欢两个'amt_needed'值为=上面两行中“剩余”的两个值。)
操作#2: 'remaining'的两个NA值应该是新的'amt_needed'值 - a和b的sum(contrib)。
任何想法/建议都赞赏!
Operator clockIN clockOUT
DG01E0020 2018-01-27 09:36:46.000 2018-01-27 15:25:53.000
DG01E0027 2018-01-27 10:54:53.000 2018-01-27 19:39:13.000
DG01E0025 2018-01-27 11:48:44.000 2018-01-27 15:32:02.000
DG01E0013 2018-01-27 12:02:56.000 2018-01-27 17:50:41.000
DG01E0032 2018-01-27 12:04:01.000 2018-01-27 16:27:09.000
DG01E0024 2018-01-27 12:04:33.000 2018-01-27 15:21:21.000
DG01E0015 2018-01-27 12:05:49.000 2018-01-27 15:57:39.000
DG01E0012 2018-01-27 12:16:58.000 2018-01-28 00:34:36.000
DG01E0023 2018-01-27 13:04:49.000 2018-01-27 17:05:37.000
DG01E0020 2018-01-27 16:47:20.000 2018-01-27 22:19:13.000
DG01E0032 2018-01-27 18:30:15.000 2018-01-27 23:11:30.000
DG01E0025 2018-01-27 18:45:24.000 2018-01-28 00:02:26.000
DG01E0015 2018-01-27 18:48:04.000 2018-01-28 00:30:13.000
DG01E0024 2018-01-27 18:52:47.000 2018-01-28 00:02:50.000
DG01E0023 2018-01-27 19:08:48.000 2018-01-28 00:32:21.000
DG01E0013 2018-01-27 19:23:14.000 2018-01-27 23:12:12.000
DG01E0013 2018-01-28 00:02:20.000 2018-01-28 00:02:43.000
答案 0 :(得分:1)
根据OP中提供的新数据,使用dplyr的一个解决方案可能是:
library(dplyr)
# Data
test <- data.frame(date = c("2018-01-01", "2018-01-01", "2018-01-15", "2018-01-15", "2018-01-30", "2018-01-30"),
name = c("a","b","a","b", "a","b"),
contrib = c(4,2,4,2,4,2),
amt_needed = c(100,100, NA,NA, NA,NA),
remaining = c(94,94, NA,NA, NA,NA))
# Change column to date
test$date <- as.Date(test$date, "%Y-%m-%d")
test$amt_needed <- test$amt_needed[1]
test %>%
arrange(date, name) %>%
group_by(date) %>%
mutate(group_contrib = cumsum(sum(contrib))) %>%
ungroup() %>%
select(date, group_contrib) %>%
unique() %>%
arrange(date) %>%
mutate(cumm_group_sum = cumsum(group_contrib)) %>%
inner_join(test, by = "date") %>%
mutate(remaining = amt_needed - cumm_group_sum) %>%
mutate(amt_needed_act = remaining + group_contrib) %>%
select(date, name, contrib, amt_needed_act, remaining)
# A tibble: 6 x 5
date name contrib amt_needed_act remaining
<date> <fctr> <dbl> <dbl> <dbl>
1 2018-01-01 a 4.00 100 94.0
2 2018-01-01 b 2.00 100 94.0
3 2018-01-15 a 4.00 94.0 88.0
4 2018-01-15 b 2.00 94.0 88.0
5 2018-01-30 a 4.00 88.0 82.0
6 2018-01-30 b 2.00 88.0 82.0