map / dplyr方法,用于动态填充数据帧中的两列

时间:2018-01-28 02:38:52

标签: r dplyr purrr

我的数据框'test'如下图所示。

我有2个不同的操作我想在两个不同的列上完成,并且如果可能的话,我希望使用有效的dplyr或purrr方法来解析。

操作#1: 我喜欢将'amt_needed'NA值填充为上面'剩余'的两个值(这是一个测试数据帧,但在实际版本中,Ill有更多行,每次Id都喜欢两个'amt_needed'值为=上面两行中“剩余”的两个值。)

操作#2: 'remaining'的两个NA值应该是新的'amt_needed'值 - a和b的sum(contrib)。

任何想法/建议都赞赏!

Operator    clockIN                 clockOUT
DG01E0020   2018-01-27 09:36:46.000 2018-01-27 15:25:53.000
DG01E0027   2018-01-27 10:54:53.000 2018-01-27 19:39:13.000
DG01E0025   2018-01-27 11:48:44.000 2018-01-27 15:32:02.000
DG01E0013   2018-01-27 12:02:56.000 2018-01-27 17:50:41.000
DG01E0032   2018-01-27 12:04:01.000 2018-01-27 16:27:09.000
DG01E0024   2018-01-27 12:04:33.000 2018-01-27 15:21:21.000
DG01E0015   2018-01-27 12:05:49.000 2018-01-27 15:57:39.000
DG01E0012   2018-01-27 12:16:58.000 2018-01-28 00:34:36.000
DG01E0023   2018-01-27 13:04:49.000 2018-01-27 17:05:37.000
DG01E0020   2018-01-27 16:47:20.000 2018-01-27 22:19:13.000
DG01E0032   2018-01-27 18:30:15.000 2018-01-27 23:11:30.000
DG01E0025   2018-01-27 18:45:24.000 2018-01-28 00:02:26.000
DG01E0015   2018-01-27 18:48:04.000 2018-01-28 00:30:13.000
DG01E0024   2018-01-27 18:52:47.000 2018-01-28 00:02:50.000
DG01E0023   2018-01-27 19:08:48.000 2018-01-28 00:32:21.000
DG01E0013   2018-01-27 19:23:14.000 2018-01-27 23:12:12.000
DG01E0013   2018-01-28 00:02:20.000 2018-01-28 00:02:43.000

1 个答案:

答案 0 :(得分:1)

根据OP中提供的新数据,使用dplyr的一个解决方案可能是:

    library(dplyr)
    # Data
test <- data.frame(date = c("2018-01-01", "2018-01-01", "2018-01-15", "2018-01-15", "2018-01-30", "2018-01-30"), 
                   name = c("a","b","a","b", "a","b"), 
                   contrib = c(4,2,4,2,4,2), 
                   amt_needed = c(100,100, NA,NA, NA,NA), 
                   remaining = c(94,94, NA,NA, NA,NA))

    # Change column to date
    test$date <- as.Date(test$date, "%Y-%m-%d")
    test$amt_needed <- test$amt_needed[1]

test %>%
  arrange(date, name) %>%
  group_by(date) %>%
  mutate(group_contrib = cumsum(sum(contrib))) %>%
  ungroup() %>%
  select(date, group_contrib) %>%
  unique() %>% 
  arrange(date) %>%
  mutate(cumm_group_sum = cumsum(group_contrib)) %>%
  inner_join(test, by = "date") %>% 
  mutate(remaining = amt_needed - cumm_group_sum) %>%
  mutate(amt_needed_act = remaining + group_contrib) %>%
  select(date, name, contrib, amt_needed_act, remaining)

# A tibble: 6 x 5
  date       name   contrib amt_needed_act remaining
  <date>     <fctr>   <dbl>          <dbl>     <dbl>
1 2018-01-01 a         4.00          100        94.0
2 2018-01-01 b         2.00          100        94.0
3 2018-01-15 a         4.00           94.0      88.0
4 2018-01-15 b         2.00           94.0      88.0
5 2018-01-30 a         4.00           88.0      82.0
6 2018-01-30 b         2.00           88.0      82.0