将日期列映射到R中从1开始的唯一序列号

时间:2017-11-30 20:48:49

标签: r data-manipulation

我在数据框中有一列日期,其中每个日期通常会重复几次。以下是我的数据框的示例,其中还包含其他列中某些运动队的名称:

dput(mydf)
structure(list(date_game = structure(c(15643, 15643, 15643, 15644, 
15644, 15644, 15646, 15646), class = "Date"), team_id = c("WAS", 
"CLE", "LAL", "SAC", "CHI", "DET", "BOS", "MIL"), fran_id = c("Wizards", 
"Cavaliers", "Lakers", "Kings", "Bulls", "Pistons", "Celtics", 
"Bucks")), .Names = c("date_game", "team_id", "fran_id"), row.names = c(1L, 
2L, 3L, 7L, 8L, 9L, 29L, 30L), class = "data.frame")

在这种情况下,mydf有3个唯一日期,并且也会跳过日期。我的完整数据框有数百个独特的日期。对于这个例子,我有兴趣在数据帧中添加一个新列(称之为date_number),如下所示:

mydf
    date_game team_id   fran_id  date_number
1  2012-10-30     WAS   Wizards            1
2  2012-10-30     CLE Cavaliers            1
3  2012-10-30     LAL    Lakers            1
7  2012-10-31     SAC     Kings            2
8  2012-10-31     CHI     Bulls            2
9  2012-10-31     DET   Pistons            2
29 2012-11-02     BOS   Celtics            3
30 2012-11-02     MIL     Bucks            3

就像标题所说 - 从date_number列中的1开始,我想增加日期的连续数字。关键部分是即使缺少某些日期,列也是连续的。虽然11-01不存在,但11-02仍然设置为3,而不是4。

对于如何做到这一点的任何想法将不胜感激!

3 个答案:

答案 0 :(得分:1)

您可以使用for (let i = 1; i <= 5; ++i) { let el = $('#jobRank' + i); if (el.html() === 'Entrepreneur (Business Owner)') { el.attr('href', '/entrepreneur'); } } 中的rleid

执行此操作
data.table

<强>结果:

library(data.table)

setDT(df)[, date_number := rleid(date_game)]

正如@Mike H.所提到的,你也可以从> df date_game team_id fran_id date_number 1: 2012-10-30 WAS Wizards 1 2: 2012-10-30 CLE Cavaliers 1 3: 2012-10-30 LAL Lakers 1 4: 2012-10-31 SAC Kings 2 5: 2012-10-31 CHI Bulls 2 6: 2012-10-31 DET Pistons 2 7: 2012-11-02 BOS Celtics 3 8: 2012-11-02 MIL Bucks 3 偷取rleid函数而不转换data.table

df

Base R的另一个选择:

df$date_numbers <- data.table::rleid(df$date_game)

答案 1 :(得分:1)

您可以使用

mydf$date_number = as.integer(as.factor(mydf$date_game))

答案 2 :(得分:1)

另一个稍微深奥的选择:

mydf$date_numbers <- cumsum(c(1, tail(!(mydf$date_game == lag(mydf$date_game)), - 1)))