我的数据与以下类似,但更大
boat = c(1,1,1,1,1,1,1,2,2,2,2,2,2)
species = c("cod", "haddock", "ling",
"cod", "haddock", "ling", "tusk",
"cod", "haddock", "ling",
"cod", "haddock", "ling")
date = c(as.Date(c("1.03.2017","1.03.2017","1.03.2017",
"2.03.2017", "2.03.2017", "2.03.2017","2.03.2017",
"4.03.2017","4.03.2017","4.03.2017",
"7.03.2017", "7.03.2017", "7.03.2017"), "%d.%m.%Y"))
df <- data.frame(boat, species, date)
df
boat species date
1 cod 01.03.2017
1 haddock 01.03.2017
1 ling 01.03.2017
1 cod 02.03.2017
1 haddock 02.03.2017
1 ling 02.03.2017
1 tusk 02.03.2017
2 cod 04.03.2017
2 haddock 04.03.2017
2 ling 04.03.2017
2 cod 07.03.2017
2 haddock 07.03.2017
2 ling 07.03.2017
我想创建一个额外的列,按顺序按船顺序排列日期,以便我的数据集看起来像这样。
boat species date rank
1 cod 01.03.2017 1
1 haddock 01.03.2017 1
1 ling 01.03.2017 1
1 cod 02.03.2017 2
1 haddock 02.03.2017 2
1 ling 02.03.2017 2
1 tusk 02.03.2017 2
2 cod 04.03.2017 1
2 haddock 04.03.2017 1
2 ling 04.03.2017 1
2 cod 07.03.2017 2
2 haddock 07.03.2017 2
2 ling 07.03.2017 2
我尝试过以下代码
library(dplyr)
df %>%
group_by(boat, species) %>%
mutate(Order = rank(date))
但是之前没有出现的物种被给予等级&#34; 1&#34;他们第一次出现。 任何帮助表示赞赏。
答案 0 :(得分:5)
我们可以使用group_by
中的dense_rank
和dplyr
来创建所需的输出。
library(dplyr)
df2 <- df %>%
group_by(boat) %>%
mutate(rank = dense_rank(date))
df2
# A tibble: 13 x 4
# Groups: boat [2]
boat species date rank
<dbl> <fctr> <date> <int>
1 1 cod 2017-03-01 1
2 1 haddock 2017-03-01 1
3 1 ling 2017-03-01 1
4 1 cod 2017-03-02 2
5 1 haddock 2017-03-02 2
6 1 ling 2017-03-02 2
7 1 tusk 2017-03-02 2
8 2 cod 2017-03-04 1
9 2 haddock 2017-03-04 1
10 2 ling 2017-03-04 1
11 2 cod 2017-03-07 2
12 2 haddock 2017-03-07 2
13 2 ling 2017-03-07 2
答案 1 :(得分:1)
library(dplyr)
left_join(df,
unique(df[,c(1,3)]) %>%
group_by(boat) %>%
mutate(Order = rank(date)))
## boat species date Order
## 1 1 cod 2017-03-01 1
## 2 1 haddock 2017-03-01 1
## 3 1 ling 2017-03-01 1
## 4 1 cod 2017-03-02 2
## 5 1 haddock 2017-03-02 2
## 6 1 ling 2017-03-02 2
## 7 1 tusk 2017-03-02 2
## 8 2 cod 2017-03-04 1
## 9 2 haddock 2017-03-04 1
## 10 2 ling 2017-03-04 1
## 11 2 cod 2017-03-07 2
## 12 2 haddock 2017-03-07 2
## 13 2 ling 2017-03-07 2
答案 2 :(得分:1)
df %>%
group_by(boat) %>%
mutate(Order=cumsum(lag(date,default=head(date,1)) != date)+1)
boat species date Order
1 1 cod 2017-03-01 1
2 1 haddock 2017-03-01 1
3 1 ling 2017-03-01 1
4 1 cod 2017-03-02 2
5 1 haddock 2017-03-02 2
6 1 ling 2017-03-02 2
7 1 tusk 2017-03-02 2
8 2 cod 2017-03-04 1
9 2 haddock 2017-03-04 1
10 2 ling 2017-03-04 1
11 2 cod 2017-03-07 2
12 2 haddock 2017-03-07 2
13 2 ling 2017-03-07 2
答案 3 :(得分:1)
在基础R中,您可以使用ave
进行组级别计算,并使用cumsum
,diff
和sign
对由此构造的整数执行这些计算日期变量。
df$rank <- ave(as.integer(df$date),
df$boat, FUN=function(x) cumsum(c(1, sign(diff(x)))))
返回
df
boat species date rank
1 1 cod 2017-03-01 1
2 1 haddock 2017-03-01 1
3 1 ling 2017-03-01 1
4 1 cod 2017-03-02 2
5 1 haddock 2017-03-02 2
6 1 ling 2017-03-02 2
7 1 tusk 2017-03-02 2
8 2 cod 2017-03-04 1
9 2 haddock 2017-03-04 1
10 2 ling 2017-03-04 1
11 2 cod 2017-03-07 2
12 2 haddock 2017-03-07 2
13 2 ling 2017-03-07 2
作为使用新引入的(R 3.3.0)grouping
函数的借口,你也可以这样做
df$rank2 <- ave(as.integer(df$date), df$boat,
FUN=function(x) {tmp <- attr(grouping(x), "ends");
rep(seq_along(tmp), c(tmp[1], diff(tmp)))})