根据日期创建列

时间:2017-08-25 13:11:28

标签: r date dataframe dplyr

我的数据与以下类似,但更大

boat = c(1,1,1,1,1,1,1,2,2,2,2,2,2)
species = c("cod", "haddock", "ling", 
       "cod", "haddock", "ling", "tusk", 
       "cod", "haddock", "ling",
       "cod", "haddock", "ling")
date = c(as.Date(c("1.03.2017","1.03.2017","1.03.2017",
               "2.03.2017", "2.03.2017", "2.03.2017","2.03.2017",
               "4.03.2017","4.03.2017","4.03.2017",
               "7.03.2017", "7.03.2017", "7.03.2017"), "%d.%m.%Y"))
df <- data.frame(boat, species, date)

    df
    boat  species  date
    1     cod      01.03.2017
    1     haddock  01.03.2017
    1     ling     01.03.2017
    1     cod      02.03.2017
    1     haddock  02.03.2017
    1     ling     02.03.2017
    1     tusk     02.03.2017
    2     cod      04.03.2017
    2     haddock  04.03.2017
    2     ling     04.03.2017
    2     cod      07.03.2017
    2     haddock  07.03.2017
    2     ling     07.03.2017

我想创建一个额外的列,按顺序按船顺序排列日期,以便我的数据集看起来像这样。

    boat  species  date       rank
    1     cod      01.03.2017 1
    1     haddock  01.03.2017 1
    1     ling     01.03.2017 1
    1     cod      02.03.2017 2
    1     haddock  02.03.2017 2
    1     ling     02.03.2017 2
    1     tusk     02.03.2017 2
    2     cod      04.03.2017 1
    2     haddock  04.03.2017 1
    2     ling     04.03.2017 1
    2     cod      07.03.2017 2
    2     haddock  07.03.2017 2
    2     ling     07.03.2017 2

我尝试过以下代码

library(dplyr)

df %>% 
group_by(boat, species) %>% 
mutate(Order = rank(date))

但是之前没有出现的物种被给予等级&#34; 1&#34;他们第一次出现。 任何帮助表示赞赏。

4 个答案:

答案 0 :(得分:5)

我们可以使用group_by中的dense_rankdplyr来创建所需的输出。

library(dplyr)

df2 <- df %>%
  group_by(boat) %>%
  mutate(rank = dense_rank(date))

df2
# A tibble: 13 x 4
# Groups:   boat [2]
    boat species       date  rank
   <dbl>  <fctr>     <date> <int>
 1     1     cod 2017-03-01     1
 2     1 haddock 2017-03-01     1
 3     1    ling 2017-03-01     1
 4     1     cod 2017-03-02     2
 5     1 haddock 2017-03-02     2
 6     1    ling 2017-03-02     2
 7     1    tusk 2017-03-02     2
 8     2     cod 2017-03-04     1
 9     2 haddock 2017-03-04     1
10     2    ling 2017-03-04     1
11     2     cod 2017-03-07     2
12     2 haddock 2017-03-07     2
13     2    ling 2017-03-07     2

答案 1 :(得分:1)

library(dplyr)

left_join(df, 
             unique(df[,c(1,3)]) %>% 
                                 group_by(boat) %>% 
                                 mutate(Order = rank(date)))


##    boat species       date Order
## 1     1     cod 2017-03-01     1
## 2     1 haddock 2017-03-01     1
## 3     1    ling 2017-03-01     1
## 4     1     cod 2017-03-02     2
## 5     1 haddock 2017-03-02     2
## 6     1    ling 2017-03-02     2
## 7     1    tusk 2017-03-02     2
## 8     2     cod 2017-03-04     1
## 9     2 haddock 2017-03-04     1
## 10    2    ling 2017-03-04     1
## 11    2     cod 2017-03-07     2
## 12    2 haddock 2017-03-07     2
## 13    2    ling 2017-03-07     2

答案 2 :(得分:1)

解决方案

 df %>% 
    group_by(boat) %>% 
    mutate(Order=cumsum(lag(date,default=head(date,1)) != date)+1)

输出

    boat species       date Order
 1     1     cod 2017-03-01     1
 2     1 haddock 2017-03-01     1
 3     1    ling 2017-03-01     1
 4     1     cod 2017-03-02     2
 5     1 haddock 2017-03-02     2
 6     1    ling 2017-03-02     2
 7     1    tusk 2017-03-02     2
 8     2     cod 2017-03-04     1
 9     2 haddock 2017-03-04     1
10     2    ling 2017-03-04     1
11     2     cod 2017-03-07     2
12     2 haddock 2017-03-07     2
13     2    ling 2017-03-07     2

答案 3 :(得分:1)

在基础R中,您可以使用ave进行组级别计算,并使用cumsumdiffsign对由此构造的整数执行这些计算日期变量。

df$rank <- ave(as.integer(df$date),
               df$boat, FUN=function(x) cumsum(c(1, sign(diff(x)))))

返回

df
   boat species       date rank
1     1     cod 2017-03-01    1
2     1 haddock 2017-03-01    1
3     1    ling 2017-03-01    1
4     1     cod 2017-03-02    2
5     1 haddock 2017-03-02    2
6     1    ling 2017-03-02    2
7     1    tusk 2017-03-02    2
8     2     cod 2017-03-04    1
9     2 haddock 2017-03-04    1
10    2    ling 2017-03-04    1
11    2     cod 2017-03-07    2
12    2 haddock 2017-03-07    2
13    2    ling 2017-03-07    2

作为使用新引入的(R 3.3.0)grouping函数的借口,你也可以这样做

df$rank2 <- ave(as.integer(df$date), df$boat,
                FUN=function(x) {tmp <- attr(grouping(x), "ends");
                                 rep(seq_along(tmp), c(tmp[1], diff(tmp)))})