Question

我有以下数据框（示例数据），其中包含针对不同种群的不同鸟类对的不同DVD录制日期：

PairID   BroodRef  DVDdate
1        512       2004-05-22
1        512       2004-05-30
1        512       2004-05-26
1        588       2004-06-30
1        588       2004-07-04
1        588       2004-07-09
2        673       2004-07-19
3        543       2004-06-03
3        543       2004-06-07
3        543       2004-06-11
3        620       2004-07-19
3         39       2005-05-19
3         39       2005-05-23

我想要的是每对的育雏编号，例如：

PairID    BroodRef    DVDdate    BroodNumber
1        512       2004-05-22       1
1        512       2004-05-30       1
1        512       2004-05-26       1
1        588       2004-06-30       2
1        588       2004-07-04       2
1        588       2004-07-09       2
2        673       2004-07-19       1
3        543       2004-06-03       1
3        543       2004-06-07       1
3        543       2004-06-11       1
3        620       2004-07-19       2
3         39       2005-05-19       3
3         39       2005-05-23       3

我试过了

ddply(df,.(PairID),transform,BroodNumber = dense_rank(BroodRef))

我在另一个问题上看到了，但这导致了对3，BroodRef 39是BroodNumber 1而不是它应该是3。

感谢任何帮助！

Answer 1

我们可以使用rleid()中的data.table来创建基于BroodRef的序列，按PairID分组。

library(data.table)
setDT(df)[,BroodNumber := rleid(BroodRef), by = PairID]
#    PairID BroodRef    DVDdate BroodNumber
# 1:      1      512 2004-05-22           1
# 2:      1      512 2004-05-30           1
# 3:      1      512 2004-05-26           1
# 4:      1      588 2004-06-30           2
# 5:      1      588 2004-07-04           2
# 6:      1      588 2004-07-09           2
# 7:      2      673 2004-07-19           1
# 8:      3      543 2004-06-03           1
# 9:      3      543 2004-06-07           1
#10:      3      543 2004-06-11           1
#11:      3      620 2004-07-19           2
#12:      3       39 2005-05-19           3
#13:      3       39 2005-05-23           3

Answer 2

我们可以使用dplyr

library(dplyr)
df1 %>%
   group_by(PairID) %>%
   mutate(BroodNumber = match(BroodRef, unique(BroodRef)))
#   PairID BroodRef    DVDdate BroodNumber
#    (int)    (int)      (chr)       (int)
#1       1      512 2004-05-22           1
#2       1      512 2004-05-30           1
#3       1      512 2004-05-26           1
#4       1      588 2004-06-30           2
#5       1      588 2004-07-04           2
#6       1      588 2004-07-09           2
#7       2      673 2004-07-19           1
#8       3      543 2004-06-03           1
#9       3      543 2004-06-07           1
#10      3      543 2004-06-11           1
#11      3      620 2004-07-19           2
#12      3       39 2005-05-19           3
#13      3       39 2005-05-23           3

基于R中的两个其他（链接）列创建排名列

2 个答案: