R:na.locf表现异常

时间:2018-08-22 01:13:44

标签: r zoo

我试图在突变中使用na.locf函数,但得到一个奇怪的答案。数据按日期先后排序,然后如果列为NA,则从na.locf获取结果,否则使用列中的值。对于大多数数据,将按预期返回答案,但返回的行不是作为上一个非NA,而是作为下一个非NA。如果我们按日期升序对数据进行排序,并使用na.rm = F且fromLast = T可以按预期工作,但我想了解为什么如果日期按降序排序,结果将不起作用。

示例如下:

example = data.frame(Date = factor(c("1/14/15", "1/29/15", "2/3/15", 
    "2/11/15", "2/15/15", "3/4/15","3/7/15",  "3/7/15", "3/11/15", 
    "3/18/15", "3/21/15", "4/22/15", "4/22/15", "4/23/15", "5/6/15", 
    "5/13/15", "5/18/15", "5/24/15", "5/26/15", "5/28/15", "5/29/15", 
    "5/29/15", "6/25/15", "6/25/15","8/6/15",  "8/15/15", "8/20/15", 
    "8/22/15", "8/22/15", "8/29/15")),
   Scan = c(1, rep(NA, 21),2,rep(NA,7)),
   Hours = c(rep(NA,3), rep(3,3), NA, 2, rep(3,3), NA, 2, 3, 2, 
    rep(3,5), NA, 2, rep(c(NA, 3),2), 3, NA, 2, 3)
                   )
example %>% 
  mutate(
     date = as.Date(Date, "%m/%d/%y"),
     Hours = replace_na(Hours,0),
     scan_date = as.Date(ifelse(is.na(Scan), 
                            NA,
                            date),
                       origin="1970-01-01")) %>% 
  arrange(desc(date)) %>%
  mutate(
         scan_new = ifelse(is.na(Scan),
                na.locf(Scan), 
                Scan))

结果中的问题在第24行中,扫描以1而不是2出现:

      Date Scan Hours       date  scan_date scan_new
23  3/7/15   NA     0 2015-03-07       <NA>        2
24  3/7/15   NA     2 2015-03-07       <NA>        1
25  3/4/15   NA     3 2015-03-04       <NA>        2

有趣的是,具有相同日期的其他数据也得到了适当处理,例如在第18-19行上

      Date Scan Hours       date  scan_date scan_new
18 4/22/15   NA     0 2015-04-22       <NA>        2
19 4/22/15   NA     2 2015-04-22       <NA>        2

作为上述参考,以下内容提供了预期的答案:

example %>% 
  mutate(
     date = as.Date(Date, "%m/%d/%y"),
     Hours = replace_na(Hours,0),
     scan_date = as.Date(ifelse(is.na(Scan), 
                            NA,
                            date),
                       origin="1970-01-01")) %>% 
  arrange(desc(date)) %>%
  mutate(
         scan_new = ifelse(is.na(Scan),
                na.locf(Scan, na.rm = F, fromLast = T), 
                Scan))

      Date Scan Hours       date  scan_date scan_new
6   3/4/15   NA     3 2015-03-04       <NA>        2
7   3/7/15   NA     0 2015-03-07       <NA>        2
8   3/7/15   NA     2 2015-03-07       <NA>        2

有人可以告诉我为什么这种方式吗?

1 个答案:

答案 0 :(得分:2)

在您的第一次尝试na.locf(Scan)中,前导NA被删除,其余值被回收到ifelse中的全长。您可以使用na.rm = F(或na.locf0,请参阅注释)查看结果以供参考:

example %>% 
    mutate(
        date = as.Date(Date, "%m/%d/%y"),
        Hours = replace_na(Hours,0),
        scan_date = as.Date(ifelse(is.na(Scan), 
            NA,
            date),
            origin="1970-01-01")) %>% 
    arrange(desc(date)) %>%
    mutate(
        scan_new = ifelse(is.na(Scan),
            na.locf(Scan, na.rm = FALSE), 
            Scan))

#       Date Scan Hours       date  scan_date scan_new
# 1  8/29/15   NA     3 2015-08-29       <NA>       NA
# 2  8/22/15   NA     0 2015-08-22       <NA>       NA
# 3  8/22/15   NA     2 2015-08-22       <NA>       NA
# 4  8/20/15   NA     3 2015-08-20       <NA>       NA
# 5  8/15/15   NA     3 2015-08-15       <NA>       NA
# 6   8/6/15   NA     0 2015-08-06       <NA>       NA
# 7  6/25/15    2     0 2015-06-25 2015-06-25        2
# 8  6/25/15   NA     3 2015-06-25       <NA>        2
# 9  5/29/15   NA     0 2015-05-29       <NA>        2
# 10 5/29/15   NA     2 2015-05-29       <NA>        2
# 11 5/28/15   NA     3 2015-05-28       <NA>        2
# 12 5/26/15   NA     3 2015-05-26       <NA>        2
# 13 5/24/15   NA     3 2015-05-24       <NA>        2
# 14 5/18/15   NA     3 2015-05-18       <NA>        2
# 15 5/13/15   NA     3 2015-05-13       <NA>        2
# 16  5/6/15   NA     2 2015-05-06       <NA>        2
# 17 4/23/15   NA     3 2015-04-23       <NA>        2
# 18 4/22/15   NA     0 2015-04-22       <NA>        2
# 19 4/22/15   NA     2 2015-04-22       <NA>        2
# 20 3/21/15   NA     3 2015-03-21       <NA>        2
# 21 3/18/15   NA     3 2015-03-18       <NA>        2
# 22 3/11/15   NA     3 2015-03-11       <NA>        2
# 23  3/7/15   NA     0 2015-03-07       <NA>        2
# 24  3/7/15   NA     2 2015-03-07       <NA>        2
# 25  3/4/15   NA     3 2015-03-04       <NA>        2
# 26 2/15/15   NA     3 2015-02-15       <NA>        2
# 27 2/11/15   NA     3 2015-02-11       <NA>        2
# 28  2/3/15   NA     0 2015-02-03       <NA>        2
# 29 1/29/15   NA     0 2015-01-29       <NA>        2
# 30 1/14/15    1     0 2015-01-14 2015-01-14        1