我有一个包含3个变量的数据集。两个是因子变量(Policy_num和Presidentnumber)。第三个变量是一个连续值(pred)。我想创建一个新变量,它是pred foreach Presidentnumber和Policy_num的第一个区别。以下代码有效,但对我来说,只是总统编号pred的第一个区别。该数据框名为dydx。这似乎很简单,但是我很困惑。
newobject2 = dydx %>%
group_by(Policy_num,presidentnumber) %>%
mutate(dydx2 = pred-lag(pred))
产生此:
ob Polic_num Pres pred dydx2
1 SocialWelfare Reagan 5.215365 NA
2 SocialWelfare Reagan 4.373108 -0.8422576
3 Agriculture Reagan 5.180910 0.8078020
4 Agriculture Reagan 4.338652 -0.8422576
5 Commerce Reagan 5.206816 0.8681638
6 Commerce Reagan 4.364558 -0.8422576
它应该像这样:
ob Polic_num Pres pred dydx2
1 SocialWelfare Reagan 5.215365 NA
2 SocialWelfare Reagan 4.373108 -0.8422576
3 Agriculture Reagan 5.180910 NA
4 Agriculture Reagan 4.338652 -0.8422576
5 Commerce Reagan 5.206816 NA
6 Commerce Reagan 4.364558 -0.8422576
以下是可验证示例的代码。
presidentnumber = c("Reagan", "Reagan", "Reagan", "Reagan", "Bush", "Bush",
"Bush", "Bush", "Clinton", "Clinton", "Clinton", "Clinton")
Policy_num=c("Agriculture", "Agriculture", "Social", "Social","Agriculture",
"Agriculture", "Social", "Social","Agriculture", "Agriculture", "Social",
"Social")
pred=seq(1:12)
ND=as.data.frame(cbind.data.frame(presidentnumber, Policy_num, pred))
newobject4=ND %>%
group_by(Policy_num, presidentnumber ) %>%
mutate(dydx2 = c(NA, diff(pred)))
这是什么产生的:
Obs presidentnum Policy_num pred dydx2
1 Reagan Agriculture 1 NA
2 Reagan Agriculture 2 1
3 Reagan Social 3 1
4 Reagan Social 4 1
5 Bush Agriculture 5 1
6 Bush Agriculture 6 1
7 Bush Social 7 1
8 Bush Social 8 1
9 Clinton Agriculture 9 1
10 Clinton Agriculture 10 1
11 Clinton Social 11 1
12 Clinton Social 12 1
但是,上面每隔1个就应该是NA。
答案 0 :(得分:1)
因此,当我将您的可验证代码用作:
require(dplyr)
newobject4 <- ND %>% group_by(Policy_num, presidentnumber ) %>% mutate(dydx2 = c(NA, diff(pred)))
newobject4
# A tibble: 12 x 4
# Groups: Policy_num, presidentnumber [6]
presidentnumber Policy_num pred dydx2
<fct> <fct> <int> <int>
1 Reagan Agriculture 1 NA
2 Reagan Agriculture 2 1
3 Reagan Social 3 NA
4 Reagan Social 4 1
5 Bush Agriculture 5 NA
6 Bush Agriculture 6 1
7 Bush Social 7 NA
8 Bush Social 8 1
9 Clinton Agriculture 9 NA
10 Clinton Agriculture 10 1
11 Clinton Social 11 NA
12 Clinton Social 12 1
然后:
require(plyr); require(dplyr)
newobject4 <- ND %>% group_by(Policy_num, presidentnumber ) %>% mutate(dydx2 = c(NA, diff(pred)))
newobject4
# A tibble: 12 x 4
# Groups: Policy_num, presidentnumber [6]
presidentnumber Policy_num pred dydx2
<fct> <fct> <int> <int>
1 Reagan Agriculture 1 NA
2 Reagan Agriculture 2 1
3 Reagan Social 3 1
4 Reagan Social 4 1
5 Bush Agriculture 5 1
6 Bush Agriculture 6 1
7 Bush Social 7 1
8 Bush Social 8 1
9 Clinton Agriculture 9 1
10 Clinton Agriculture 10 1
11 Clinton Social 11 1
12 Clinton Social 12 1
以上注释中关于您可能在plyr
之前加载dplyr
的建议可能是正确的,也可能是间接的。在plyr
之前,您可能已经加载了另一个依赖于dplyr
的软件包。要解决此问题,请执行以下操作:
newobject4 <- ND %>% group_by(Policy_num, presidentnumber ) %>% dplyr::mutate(dydx2 = c(NA, diff(pred)))
# A tibble: 12 x 4
# Groups: Policy_num, presidentnumber [6]
presidentnumber Policy_num pred dydx2
<fct> <fct> <int> <int>
1 Reagan Agriculture 1 NA
2 Reagan Agriculture 2 1
3 Reagan Social 3 NA
4 Reagan Social 4 1
5 Bush Agriculture 5 NA
6 Bush Agriculture 6 1
7 Bush Social 7 NA
8 Bush Social 8 1
9 Clinton Agriculture 9 NA
10 Clinton Agriculture 10 1
11 Clinton Social 11 NA
12 Clinton Social 12 1