如果该部门使用最新值替换空白

时间:2019-03-01 02:23:49

标签: r dataframe

用该行业的最后几周价格替换所有空白,我该如何在R中做到这一点。请有人帮忙提供快速解决方案。任何建议都将受到高度赞赏。

输入数据帧为

sec     date      price
sec11   6/1/2019    309
sec12   7/1/2019    412
sec13   8/1/2019    206
sec14   9/1/2019    103
sec15   10/1/2019   257.5
sec16   11/1/2019   803.4
sec17   12/1/2019   638.6
sec11   13/1/2019   300
sec12   14/1/2019   400
sec13   15/1/2019   200
sec14   16/1/2019   100
sec15   17/1/2019   250
sec16   18/1/2019   780
sec17   19/1/2019   620
sec11   20/1/2019   
sec12   21/1/2019   
sec13   22/1/2019   
sec14   23/1/2019   
sec15   24/1/2019   
sec16   25/1/2019   
sec17   26/1/2019   

输出:

sec     date      price
sec11   6/1/2019    309
sec12   7/1/2019    412
sec13   8/1/2019    206
sec14   9/1/2019    103
sec15   10/1/2019   257.5
sec16   11/1/2019   803.4
sec17   12/1/2019   638.6
sec11   13/1/2019   300
sec12   14/1/2019   400
sec13   15/1/2019   200
sec14   16/1/2019   100
sec15   17/1/2019   250
sec16   18/1/2019   780
sec17   19/1/2019   620
sec11   20/1/2019   300
sec12   21/1/2019   400
sec13   22/1/2019   200
sec14   23/1/2019   100
sec15   24/1/2019   250
sec16   25/1/2019   780
sec17   26/1/2019   620

1 个答案:

答案 0 :(得分:3)

选项1:tidyr::filldplyr::group_by

这很简单,fillgroup_by中的tidyverse

library(tidyverse)
df %>%
    group_by(sec) %>%
    fill(price) %>%
    ungroup()
#
## A tibble: 21 x 3
#   sec   date      price
#   <fct> <fct>     <dbl>
# 1 sec11 6/1/2019    309
# 2 sec11 13/1/2019   300
# 3 sec11 20/1/2019   300
# 4 sec12 7/1/2019    412
# 5 sec12 14/1/2019   400
# 6 sec12 21/1/2019   400
# 7 sec13 8/1/2019    206
# 8 sec13 15/1/2019   200
# 9 sec13 22/1/2019   200
#10 sec14 9/1/2019    103
## ... with 11 more rows

在上面的输出中,对行进行了重新排序,因此,要确保确实能够再现预期的输出,我们可以添加行号并按原始行号对最终输出进行排序

 df %>%
    rowid_to_column() %>%
    group_by(sec) %>%
    fill(price) %>%
    ungroup() %>%
    arrange(rowid) %>%
    select(-rowid) %>%
    as.data.frame()
#        sec      date price
#1  sec11  6/1/2019 309.0
#2  sec12  7/1/2019 412.0
#3  sec13  8/1/2019 206.0
#4  sec14  9/1/2019 103.0
#5  sec15 10/1/2019 257.5
#6  sec16 11/1/2019 803.4
#7  sec17 12/1/2019 638.6
#8  sec11 13/1/2019 300.0
#9  sec12 14/1/2019 400.0
#10 sec13 15/1/2019 200.0
#11 sec14 16/1/2019 100.0
#12 sec15 17/1/2019 250.0
#13 sec16 18/1/2019 780.0
#14 sec17 19/1/2019 620.0
#15 sec11 20/1/2019 300.0
#16 sec12 21/1/2019 400.0
#17 sec13 22/1/2019 200.0
#18 sec14 23/1/2019 100.0
#19 sec15 24/1/2019 250.0
#20 sec16 25/1/2019 780.0
#21 sec17 26/1/2019 620.0

选项2:zoo::na.locf,其基数为R的ave

library(zoo)
transform(df, price = ave(price, sec, FUN = function(x) na.locf(x)))
#     sec      date price
#1  sec11  6/1/2019 309.0
#2  sec12  7/1/2019 412.0
#3  sec13  8/1/2019 206.0
#4  sec14  9/1/2019 103.0
#5  sec15 10/1/2019 257.5
#6  sec16 11/1/2019 803.4
#7  sec17 12/1/2019 638.6
#8  sec11 13/1/2019 300.0
#9  sec12 14/1/2019 400.0
#10 sec13 15/1/2019 200.0
#11 sec14 16/1/2019 100.0
#12 sec15 17/1/2019 250.0
#13 sec16 18/1/2019 780.0
#14 sec17 19/1/2019 620.0
#15 sec11 20/1/2019 300.0
#16 sec12 21/1/2019 400.0
#17 sec13 22/1/2019 200.0
#18 sec14 23/1/2019 100.0
#19 sec15 24/1/2019 250.0
#20 sec16 25/1/2019 780.0
#21 sec17 26/1/2019 620.0

样本数据

df <- read.table(text =
    "sec     date      price
sec11   6/1/2019    309
sec12   7/1/2019    412
sec13   8/1/2019    206
sec14   9/1/2019    103
sec15   10/1/2019   257.5
sec16   11/1/2019   803.4
sec17   12/1/2019   638.6
sec11   13/1/2019   300
sec12   14/1/2019   400
sec13   15/1/2019   200
sec14   16/1/2019   100
sec15   17/1/2019   250
sec16   18/1/2019   780
sec17   19/1/2019   620
sec11   20/1/2019   ''
sec12   21/1/2019   ''
sec13   22/1/2019   ''
sec14   23/1/2019   ''
sec15   24/1/2019   ''
sec16   25/1/2019   ''
sec17   26/1/2019   ''   ", header = T)