data.table根据唯一ID的第一行和最后一行生成具有间隔的序列

时间:2019-05-30 19:33:00

标签: r data.table sequence

我有一个这样的数据表。我将如何为每个行生成一个具有.25步长的序列,并且在该ID号中具有相同的时间。我对R很陌生,并尝试进行一些数据整理...

     id time
 1:  1   14
 2:  1   14
 3:  1   14
 4:  1   14
 5:  1   18
 6:  1   18
 7:  1   22
 8:  1   22
 9:  2    8
10:  2    8
11:  2    8
12:  2    8
13:  2   12
14:  2   15
15:  2   15
16:  2   15
17:  2   19
18:  2   19
19:  2   19
20:  2   19

我希望这样

    id time new_time
 1:  1   14 14.00
 2:  1   14 14.25 
 3:  1   14 14.50 
 4:  1   14 14.75
 5:  1   18 18.00
 6:  1   18 18.25
 7:  1   22 22.00
 8:  1   22 22.25
 9:  2    8 8.00
10:  2    8 8.25
11:  2    8 8.50
12:  2    8 8.75
13:  2   12 12.00
14:  2   15 15.00
15:  2   15 15.25
16:  2   15 15.50
17:  2   19 19.00
18:  2   19 19.25
19:  2   19 19.50
20:  2   19 19.75

2 个答案:

答案 0 :(得分:1)

您可以使用length.out中的seq参数,我们将其设置为每个time的组大小(即下面代码中的.N data.table提供的特殊符号,请参见?.N

DT[, new_time := seq(first(time), by = 0.25, length.out = .N)  , by=time][]
#    id time new_time
# 1:  1   14    14.00
# 2:  1   14    14.25
# 3:  1   14    14.50
# 4:  1   14    14.75
# 5:  1   18    18.00
# 6:  1   18    18.25
# 7:  1   22    22.00
# 8:  1   22    22.25
# 9:  2    8     8.00
#10:  2    8     8.25
#11:  2    8     8.50
#12:  2    8     8.75
#13:  2   12    12.00
#14:  2   15    15.00
#15:  2   15    15.25
#16:  2   15    15.50
#17:  2   19    19.00
#18:  2   19    19.25
#19:  2   19    19.50
#20:  2   19    19.75

另一种选择是rowid(和一些演算)

DT[, new_time := time + (rowid(time) - 1L) * 0.25]

数据

library(data.table)
DT <- fread(text = "     id time
  1   14
  1   14
  1   14
  1   14
  1   18
  1   18
  1   22
  1   22
  2    8
  2    8
  2    8
  2    8
  2   12
  2   15
  2   15
  2   15
  2   19
  2   19
  2   19
  2   19")

答案 1 :(得分:0)

下次考虑制作可复制的示例。 (请参见我在代码中提供的示例,当您有疑问时,这将对将来有所帮助。)

我使用了tidyverse(专用于dplyr软件包)来解决此问题。

## Load library (this loads lots of packages, specifically we are using dplyr)
library(tidyverse)

## Reproducible example
data <- tibble(id = c(rep(1,8),rep(2,12)),
               time = c(rep(14,4),rep(18,2),rep(22,2),rep(8,4),12,rep(15,3),rep(19,4)))

print(data)
# A tibble: 20 x 2
      id  time
   <dbl> <dbl>
 1     1    14
 2     1    14
 3     1    14
 4     1    14
 5     1    18
 6     1    18
 7     1    22
 8     1    22
 9     2     8
10     2     8
11     2     8
12     2     8
13     2    12
14     2    15
15     2    15
16     2    15
17     2    19
18     2    19
19     2    19
20     2    19

## Data with increments for each group
new_data <- data %>%
  ##Groups your data by the same variable, in this case you want to increment by 0.25 for each id within the time group
  group_by(time) %>% 
  ## Increments each id by 0.25
  mutate(new_time = ifelse((row_number() == 1), time, (0.25 * (row_number()-1)) + time)) %>% 
  ## Ungroups the data
  ungroup()

print(as.data.frame(new_data))

   id time new_time
1   1   14    14.00
2   1   14    14.25
3   1   14    14.50
4   1   14    14.75
5   1   18    18.00
6   1   18    18.25
7   1   22    22.00
8   1   22    22.25
9   2    8     8.00
10  2    8     8.25
11  2    8     8.50
12  2    8     8.75
13  2   12    12.00
14  2   15    15.00
15  2   15    15.25
16  2   15    15.50
17  2   19    19.00
18  2   19    19.25
19  2   19    19.50
20  2   19    19.75