如何不dplyr ::总结R

时间:2018-10-23 13:01:36

标签: r dplyr

我想基于唯一的ID号来汇总(城市之间的)搬迁。具有两个唯一ID的示例数据框:

  year ID city   adress
1 2013  1    B adress_1
2 2014  1    B adress_1
3 2015  1    A adress_2
4 2016  1    A adress_2
5 2013  2    B adress_3
6 2014  2    B adress_3
7 2015  2    C adress_4
8 2016  2    C adress_4

我在下面提供了示例代码。总结是正确的,除了一件事。例如,如果在城市B和城市A之间找到了重定位,我希望输出从城市B到城市A的重定位输出(并且次数1 =在数据框中看到一次)。但是,由于摘要函数的特性(以及倾向于按字母顺序存储输出的趋势),我得到以下输出

tmp <- df %>% group_by(ID, city, adress) %>% summarize(numberofyears = n())

tmp <- tmp %>% 
  group_by(ID) %>% 
  #filter(n() >1) %>% 
  mutate(from = city[1], from_adres = adress[1], from_years = numberofyears[1],  to = city[2],
  to_adres = adress[2], to_years = numberofyears[2]) %>% 
  distinct(ID, .keep_all = TRUE) %>% select(-c(2:3))


# A tibble: 2 x 8
# Groups:   ID [2]
     ID numberofyears from  from_adres from_years to    to_adres to_years
  <dbl>         <int> <fct> <fct>           <int> <fct> <fct>       <int>
1     1             2 A     adress_2            2 B     adress_1        2
2     2             2 B     adress_3            2 C     adress_4        2

这是错误的,因为我们知道adress_1位于adress_2之前。总结从B市到C市的搬迁,我得到了正确的结果。

这是一个很小的细节,但正如我试图演示的那样,是一个重要的细节。任何建议将不胜感激!

2 个答案:

答案 0 :(得分:1)

喜欢吗?

 library(tidyverse)
    df<-read.table(text=" year ID city   adress
                1 2013  1    B adress_1
                2 2014  1    B adress_1
                3 2015  1    A adress_2
                4 2016  1    A adress_2
                5 2013  2    B adress_3
                6 2014  2    B adress_3
                7 2015  2    C adress_4
                8 2016  2    C adress_4",header=T)
    df%>%
       group_by(ID, city, adress)%>%
       summarize(numberofyears = n())%>%
       mutate(id=parse_number(adress))%>%
       group_by(ID,id)%>%
       arrange(id)%>%
       ungroup()%>%
       select(-id)%>%
       group_by(ID)%>%
       mutate(from=first(city), from_adres = first(adress),
              from_years = first(numberofyears),to=last(city),
              to_adres = last(adress),to_years=last(numberofyears))%>%
       distinct(ID, .keep_all = TRUE)%>%
       select(-c(2:3))
    # A tibble: 2 x 8
    # Groups:   ID [2]
         ID numberofyears from  from_adres from_years to    to_adres to_years
      <int>         <int> <fct> <fct>           <int> <fct> <fct>       <int>
    1     1             2 B     adress_1            2 A     adress_2        2
    2     2             2 B     adress_3            2 C     adress_4        2

答案 1 :(得分:1)

类似于@jyjek,但这将允许每个ID进行一次以上移动的可能性。

spark_sklearn