dplyr基于第二列中的序列计算第一列中的值之间的差异

时间:2017-05-16 12:31:01

标签: r dplyr

age_member<- c(1975, 1980, 1979, 1985, 1993, 1998)
people<- c("male", "female", "male", "female", "male", "children") 

dataset <- data.frame(age_member, people)  

我的结果:

age_member    people
1975      male          
1980    female          
1979    male            
1985    female          
1993    male            
1998    children

我按照akrun&#39;过滤此序列男性(第一个),女性(第二个)。回答dplyr : filter a sequence of rows (in one column)
我没有保留其他序列,如男性(第一个),儿童(第二个)

我想要的是:根据年龄差异创建一个新列(mutate)。

dataset %>%
   filter(first(people)=="male", last(people) == "female", n()==2)

预期结果

age_member    people   ages_diff
1975    male            5
1980    female          NA
1979    male            6
1985    female          NA

我尝试了什么:

dataset2 <-dataset %>%
   mutate(ifelse(first(people)=="male", last(people) == "female",n()==2), last(age)- first(age))

1 个答案:

答案 0 :(得分:1)

我们可以尝试

library(dplyr)
dataset %>%
      group_by(ind = cumsum(people == "male")) %>% 
      filter(first(people)=="male", last(people) == "female", n()==2) %>% 
      mutate(ages_diff = c(diff(age_member), NA)) %>% 
      ungroup() %>%
      select(-ind)
# A tibble: 4 x 3
#  age_member people ages_diff
#       <dbl> <fctr>     <dbl>
#1       1975   male         5
#2       1980 female        NA
#3       1979   male         6
#4       1985 female        NA