我有data.frame
这样的
datdf <- structure(list(BM = rep("1907-01-01", 20),
ct = structure(rep(c(1L, 2L), each = 5, times = 2),
.Label = c("B", "A"), class = "factor"),
val = c(rep(NA, 10), 9901:9910),
facet = rep(c(1, 2), each = 10) ),
row.names = c(NA, -20L),
.Names = c("BM", "ct", "val", "facet"),
class = c("tbl_df", "tbl", "data.frame"))
我的问题如下。在进行一些分组突变(我需要cumsum
)后,我在其中一个组中得到NA
个值。并且不仅cumsum
- 对val
投掷的任何修改NA
。
datdf %>% group_by(BM, facet, ct) %>% mutate(v1 = val + 100, v2 = cumsum(val), v3 = val)
# BM ct val facet v1 v2 v3
# (chr) (fctr) (int) (dbl) (dbl) (int) (int)
# 11 1907-01-01 B 9901 2 10001 9901 9901
# 12 1907-01-01 B 9902 2 10002 19803 9902
# 13 1907-01-01 B 9903 2 10003 29706 9903
# 14 1907-01-01 B 9904 2 10004 39610 9904
# 15 1907-01-01 B 9905 2 10005 49515 9905
# 16 1907-01-01 A 9906 2 NA NA 9906
# 17 1907-01-01 A 9907 2 NA NA 9907
# 18 1907-01-01 A 9908 2 NA NA 9908
# 19 1907-01-01 A 9909 2 NA NA 9909
# 20 1907-01-01 A 9910 2 NA NA 9910
我的dplyr
版本为0.4.3,R
为3.1.3
这是一个错误还是我错过了什么?我记得在几周前更新之前,dplyr 0.4.1
没有遇到此问题。
我现在该如何解决?
答案 0 :(得分:1)
解决方法是使用mapvalues
中的plyr
函数以零替换NAs:
仅适用于v2
(cumsum专栏):
library(plyr)
datdf %>% mutate(v1 = val + 100,
v2 = cumsum(val %>% mapvalues(NA, 0)),
v3 = val)
输出:
BM ct val facet v1 v2 v3
(chr) (fctr) (int) (dbl) (dbl) (dbl) (int)
1 1907-01-01 B NA 1 NA 0 NA
2 1907-01-01 B NA 1 NA 0 NA
3 1907-01-01 B NA 1 NA 0 NA
4 1907-01-01 B NA 1 NA 0 NA
5 1907-01-01 B NA 1 NA 0 NA
6 1907-01-01 A NA 1 NA 0 NA
7 1907-01-01 A NA 1 NA 0 NA
8 1907-01-01 A NA 1 NA 0 NA
9 1907-01-01 A NA 1 NA 0 NA
10 1907-01-01 A NA 1 NA 0 NA
11 1907-01-01 B 9901 2 10001 9901 9901
12 1907-01-01 B 9902 2 10002 19803 9902
13 1907-01-01 B 9903 2 10003 29706 9903
14 1907-01-01 B 9904 2 10004 39610 9904
15 1907-01-01 B 9905 2 10005 49515 9905
16 1907-01-01 A 9906 2 10006 59421 9906
17 1907-01-01 A 9907 2 10007 69328 9907
18 1907-01-01 A 9908 2 10008 79236 9908
19 1907-01-01 A 9909 2 10009 89145 9909
20 1907-01-01 A 9910 2 10010 99055 9910
对于所有列:
datdf %>% mutate(v1 = val %>% mapvalues(NA, 0) + 100,
v2 = cumsum(val %>% mapvalues(NA, 0)),
v3 = val %>% mapvalues(NA, 0))
输出:
BM ct val facet v1 v2 v3
(chr) (fctr) (int) (dbl) (dbl) (dbl) (dbl)
1 1907-01-01 B NA 1 100 0 0
2 1907-01-01 B NA 1 100 0 0
3 1907-01-01 B NA 1 100 0 0
4 1907-01-01 B NA 1 100 0 0
5 1907-01-01 B NA 1 100 0 0
6 1907-01-01 A NA 1 100 0 0
7 1907-01-01 A NA 1 100 0 0
8 1907-01-01 A NA 1 100 0 0
9 1907-01-01 A NA 1 100 0 0
10 1907-01-01 A NA 1 100 0 0
11 1907-01-01 B 9901 2 10001 9901 9901
12 1907-01-01 B 9902 2 10002 19803 9902
13 1907-01-01 B 9903 2 10003 29706 9903
14 1907-01-01 B 9904 2 10004 39610 9904
15 1907-01-01 B 9905 2 10005 49515 9905
16 1907-01-01 A 9906 2 10006 59421 9906
17 1907-01-01 A 9907 2 10007 69328 9907
18 1907-01-01 A 9908 2 10008 79236 9908
19 1907-01-01 A 9909 2 10009 89145 9909
20 1907-01-01 A 9910 2 10010 99055 9910
答案 1 :(得分:1)
也许您遇到了一些问题: https://github.com/hadley/dplyr/issues/1448#issuecomment-150037548
试试这个:
datdf %>% group_by(BM, facet,ct) %>% plyr::mutate(v1 = val + 100, v2 = cumsum(val[!is.na(val)]), v3 = val)
BM ct val facet v1 v2 v3
(chr) (fctr) (int) (dbl) (dbl) (int) (int)
11 1907-01-01 B 9901 2 10001 9901 9901
12 1907-01-01 B 9902 2 10002 19803 9902
13 1907-01-01 B 9903 2 10003 29706 9903
14 1907-01-01 B 9904 2 10004 39610 9904
15 1907-01-01 B 9905 2 10005 49515 9905
16 1907-01-01 A 9906 2 10006 59421 9906
17 1907-01-01 A 9907 2 10007 69328 9907
18 1907-01-01 A 9908 2 10008 79236 9908
19 1907-01-01 A 9909 2 10009 89145 9909
20 1907-01-01 A 9910 2 10010 99055 9910