dplyr group_by + mutate strange NA出现

时间:2015-11-24 12:10:12

标签: r dplyr na

我有data.frame这样的

datdf  <- structure(list(BM = rep("1907-01-01", 20), 
                         ct = structure(rep(c(1L, 2L), each = 5, times = 2), 
                                        .Label = c("B", "A"), class = "factor"), 
                         val = c(rep(NA, 10), 9901:9910), 
                         facet = rep(c(1, 2), each = 10) ), 
                    row.names = c(NA, -20L), 
                    .Names = c("BM", "ct", "val", "facet"), 
                    class = c("tbl_df", "tbl", "data.frame"))

我的问题如下。在进行一些分组突变(我需要cumsum)后,我在其中一个组中得到NA个值。并且不仅cumsum - 对val投掷的任何修改NA

datdf %>% group_by(BM, facet, ct) %>% mutate(v1 = val + 100, v2 = cumsum(val), v3 = val)

#            BM     ct   val facet    v1    v2    v3
#         (chr) (fctr) (int) (dbl) (dbl) (int) (int)
# 11 1907-01-01      B  9901     2 10001  9901  9901
# 12 1907-01-01      B  9902     2 10002 19803  9902
# 13 1907-01-01      B  9903     2 10003 29706  9903
# 14 1907-01-01      B  9904     2 10004 39610  9904
# 15 1907-01-01      B  9905     2 10005 49515  9905
# 16 1907-01-01      A  9906     2    NA    NA  9906
# 17 1907-01-01      A  9907     2    NA    NA  9907
# 18 1907-01-01      A  9908     2    NA    NA  9908
# 19 1907-01-01      A  9909     2    NA    NA  9909
# 20 1907-01-01      A  9910     2    NA    NA  9910

我的dplyr版本为0.4.3,R为3.1.3

这是一个错误还是我错过了什么?我记得在几周前更新之前,dplyr 0.4.1没有遇到此问题。

我现在该如何解决?

2 个答案:

答案 0 :(得分:1)

解决方法是使用mapvalues中的plyr函数以零替换NAs:

仅适用于v2(cumsum专栏):

library(plyr)   
datdf %>%  mutate(v1 = val + 100, 
                       v2 = cumsum(val %>% mapvalues(NA, 0)), 
                       v3 = val)

输出:

           BM     ct   val facet    v1    v2    v3
        (chr) (fctr) (int) (dbl) (dbl) (dbl) (int)
1  1907-01-01      B    NA     1    NA     0    NA
2  1907-01-01      B    NA     1    NA     0    NA
3  1907-01-01      B    NA     1    NA     0    NA
4  1907-01-01      B    NA     1    NA     0    NA
5  1907-01-01      B    NA     1    NA     0    NA
6  1907-01-01      A    NA     1    NA     0    NA
7  1907-01-01      A    NA     1    NA     0    NA
8  1907-01-01      A    NA     1    NA     0    NA
9  1907-01-01      A    NA     1    NA     0    NA
10 1907-01-01      A    NA     1    NA     0    NA
11 1907-01-01      B  9901     2 10001  9901  9901
12 1907-01-01      B  9902     2 10002 19803  9902
13 1907-01-01      B  9903     2 10003 29706  9903
14 1907-01-01      B  9904     2 10004 39610  9904
15 1907-01-01      B  9905     2 10005 49515  9905
16 1907-01-01      A  9906     2 10006 59421  9906
17 1907-01-01      A  9907     2 10007 69328  9907
18 1907-01-01      A  9908     2 10008 79236  9908
19 1907-01-01      A  9909     2 10009 89145  9909
20 1907-01-01      A  9910     2 10010 99055  9910

对于所有列:

datdf %>%   mutate(v1 = val  %>% mapvalues(NA, 0) + 100, 
                   v2 = cumsum(val %>% mapvalues(NA, 0)), 
                   v3 = val %>% mapvalues(NA, 0))

输出:

           BM     ct   val facet    v1    v2    v3
        (chr) (fctr) (int) (dbl) (dbl) (dbl) (dbl)
1  1907-01-01      B    NA     1   100     0     0
2  1907-01-01      B    NA     1   100     0     0
3  1907-01-01      B    NA     1   100     0     0
4  1907-01-01      B    NA     1   100     0     0
5  1907-01-01      B    NA     1   100     0     0
6  1907-01-01      A    NA     1   100     0     0
7  1907-01-01      A    NA     1   100     0     0
8  1907-01-01      A    NA     1   100     0     0
9  1907-01-01      A    NA     1   100     0     0
10 1907-01-01      A    NA     1   100     0     0
11 1907-01-01      B  9901     2 10001  9901  9901
12 1907-01-01      B  9902     2 10002 19803  9902
13 1907-01-01      B  9903     2 10003 29706  9903
14 1907-01-01      B  9904     2 10004 39610  9904
15 1907-01-01      B  9905     2 10005 49515  9905
16 1907-01-01      A  9906     2 10006 59421  9906
17 1907-01-01      A  9907     2 10007 69328  9907
18 1907-01-01      A  9908     2 10008 79236  9908
19 1907-01-01      A  9909     2 10009 89145  9909
20 1907-01-01      A  9910     2 10010 99055  9910

答案 1 :(得分:1)

也许您遇到了一些问题: https://github.com/hadley/dplyr/issues/1448#issuecomment-150037548

试试这个:

datdf %>% group_by(BM, facet,ct) %>% plyr::mutate(v1 = val + 100, v2 = cumsum(val[!is.na(val)]), v3 = val)

               BM     ct   val facet    v1    v2    v3
            (chr) (fctr) (int) (dbl) (dbl) (int) (int)
    11 1907-01-01      B  9901     2 10001  9901  9901
    12 1907-01-01      B  9902     2 10002 19803  9902
    13 1907-01-01      B  9903     2 10003 29706  9903
    14 1907-01-01      B  9904     2 10004 39610  9904
    15 1907-01-01      B  9905     2 10005 49515  9905
    16 1907-01-01      A  9906     2 10006 59421  9906
    17 1907-01-01      A  9907     2 10007 69328  9907
    18 1907-01-01      A  9908     2 10008 79236  9908
    19 1907-01-01      A  9909     2 10009 89145  9909
    20 1907-01-01      A  9910     2 10010 99055  9910