R dplyr计算2组内的第一个差异

时间:2018-07-15 22:30:27

标签: r dplyr

我有一个包含3个变量的数据集。两个是因子变量(Policy_num和Presidentnumber)。第三个变量是一个连续值(pred)。我想创建一个新变量,它是pred foreach Presidentnumber和Policy_num的第一个区别。以下代码有效,但对我来说,只是总统编号pred的第一个区别。该数据框名为dydx。这似乎很简单,但是我很困惑。

newobject2 = dydx %>%
   group_by(Policy_num,presidentnumber) %>%
   mutate(dydx2 = pred-lag(pred))

产生此:

   ob Polic_num    Pres    pred     dydx2
   1 SocialWelfare Reagan  5.215365  NA
   2 SocialWelfare Reagan  4.373108 -0.8422576
   3 Agriculture   Reagan  5.180910  0.8078020
   4 Agriculture   Reagan  4.338652 -0.8422576
   5 Commerce      Reagan  5.206816  0.8681638
   6 Commerce      Reagan  4.364558 -0.8422576

它应该像这样:

ob Polic_num    Pres    pred     dydx2
 1 SocialWelfare Reagan  5.215365  NA
 2 SocialWelfare Reagan  4.373108 -0.8422576
 3 Agriculture   Reagan  5.180910  NA
 4 Agriculture   Reagan  4.338652 -0.8422576
 5 Commerce      Reagan  5.206816  NA
 6 Commerce      Reagan  4.364558 -0.8422576

以下是可验证示例的代码。

 presidentnumber = c("Reagan", "Reagan", "Reagan", "Reagan", "Bush", "Bush", 
 "Bush", "Bush", "Clinton", "Clinton", "Clinton", "Clinton")
 Policy_num=c("Agriculture", "Agriculture", "Social", "Social","Agriculture", 
 "Agriculture", "Social", "Social","Agriculture", "Agriculture", "Social", 
 "Social")
 pred=seq(1:12)
 ND=as.data.frame(cbind.data.frame(presidentnumber, Policy_num, pred))

 newobject4=ND %>%
   group_by(Policy_num, presidentnumber ) %>% 
   mutate(dydx2 = c(NA, diff(pred))) 

这是什么产生的:

  Obs presidentnum Policy_num pred dydx2
  1   Reagan       Agriculture 1   NA
  2   Reagan       Agriculture 2   1
  3   Reagan       Social      3   1
  4   Reagan       Social      4   1
  5   Bush         Agriculture 5   1
  6   Bush         Agriculture 6   1
  7   Bush         Social      7   1
  8   Bush         Social      8   1
  9   Clinton      Agriculture 9   1
 10   Clinton      Agriculture 10  1
 11   Clinton      Social      11  1
 12   Clinton      Social      12  1

但是,上面每隔1个就应该是NA。

1 个答案:

答案 0 :(得分:1)

因此,当我将您的可验证代码用作:

require(dplyr)
newobject4 <- ND %>% group_by(Policy_num, presidentnumber ) %>% mutate(dydx2 = c(NA, diff(pred)))

newobject4
# A tibble: 12 x 4
# Groups:   Policy_num, presidentnumber [6]
   presidentnumber Policy_num   pred dydx2
   <fct>           <fct>       <int> <int>
 1 Reagan          Agriculture     1    NA
 2 Reagan          Agriculture     2     1
 3 Reagan          Social          3    NA
 4 Reagan          Social          4     1
 5 Bush            Agriculture     5    NA
 6 Bush            Agriculture     6     1
 7 Bush            Social          7    NA
 8 Bush            Social          8     1
 9 Clinton         Agriculture     9    NA
10 Clinton         Agriculture    10     1
11 Clinton         Social         11    NA
12 Clinton         Social         12     1

然后:

require(plyr); require(dplyr)
newobject4 <- ND %>% group_by(Policy_num, presidentnumber ) %>% mutate(dydx2 = c(NA, diff(pred)))
newobject4
# A tibble: 12 x 4
# Groups:   Policy_num, presidentnumber [6]
   presidentnumber Policy_num   pred dydx2
   <fct>           <fct>       <int> <int>
 1 Reagan          Agriculture     1    NA
 2 Reagan          Agriculture     2     1
 3 Reagan          Social          3     1
 4 Reagan          Social          4     1
 5 Bush            Agriculture     5     1
 6 Bush            Agriculture     6     1
 7 Bush            Social          7     1
 8 Bush            Social          8     1
 9 Clinton         Agriculture     9     1
10 Clinton         Agriculture    10     1
11 Clinton         Social         11     1
12 Clinton         Social         12     1

以上注释中关于您可能在plyr之前加载dplyr的建议可能是正确的,也可能是间接的。在plyr之前,您可能已经加载了另一个依赖于dplyr的软件包。要解决此问题,请执行以下操作:

newobject4 <- ND %>% group_by(Policy_num, presidentnumber ) %>% dplyr::mutate(dydx2 = c(NA, diff(pred))) 
# A tibble: 12 x 4
# Groups:   Policy_num, presidentnumber [6]
   presidentnumber Policy_num   pred dydx2
   <fct>           <fct>       <int> <int>
 1 Reagan          Agriculture     1    NA
 2 Reagan          Agriculture     2     1
 3 Reagan          Social          3    NA
 4 Reagan          Social          4     1
 5 Bush            Agriculture     5    NA
 6 Bush            Agriculture     6     1
 7 Bush            Social          7    NA
 8 Bush            Social          8     1
 9 Clinton         Agriculture     9    NA
10 Clinton         Agriculture    10     1
11 Clinton         Social         11    NA
12 Clinton         Social         12     1