Question

我有两个基于日期时间字段的表。为了重新创建场景，让我们以商业与销售为例。我们想知道哪个销售与哪个广告有关。

销售只能标记到最后一个广告，并且只有在广告之后才可以标记。

此外，如果在多个广告之后发生了销售，我们只能将销售标记为最后一个广告；之前的广告素材中的连接无效。

我无法获得最后一部分。如果在多个广告之后进行销售，则所有此类广告都将与该销售合并；我不要在我的示例中，发生在“ 2017-01-01 02:05:00”的销售应与在“ 2017-01-01 02:00:00”播放的广告一起，而不是先前的广告。

output of the code

library(lubridate)
library(data.table)

ts <- seq(as.POSIXct("2017-01-01", tz = "UTC"),
          as.POSIXct("2017-01-02", tz = "UTC"),
          by = "30 min")

commercial <-
  data.table(
    c_row_number = 1:10,
    c_time       = ts[1:10],
    c_time_roll  = ts[1:10]
  )

sale <-
  data.table(
    s_row_number = 1:4,
    s_time       = ts[5:8] + minutes(5),
    s_time_roll  = ts[5:8] + minutes(5)
  )

setkey(commercial, c_time_roll)
setkey(sale, s_time_roll)

tbl_joined <- sale[commercial, roll = -Inf] # , mult = 'last']

任何想法我们如何获得c_row_number为1、2、3和4的NA。谢谢。

Answer 1

无法直接执行此操作-x[i]使用i在x中查找行。 mult用于反向操作-当x中的多行与i中的单行匹配时。这里，i中的多行与x中的单行匹配。

那么，最好的选择是对结果表进行后联接。例如，要删除这些行，可以使用unique：

unique(sale[commercial, roll = -Inf], by = 's_row_number', fromLast = TRUE)
#    s_row_number              s_time         s_time_roll c_row_number
# 1:            1 2017-01-01 02:05:00 2017-01-01 02:00:00            5
# 2:            2 2017-01-01 02:35:00 2017-01-01 02:30:00            6
# 3:            3 2017-01-01 03:05:00 2017-01-01 03:00:00            7
# 4:            4 2017-01-01 03:35:00 2017-01-01 03:30:00            8
# 5:           NA                <NA> 2017-01-01 04:30:00           10
#                 c_time
# 1: 2017-01-01 02:00:00
# 2: 2017-01-01 02:30:00
# 3: 2017-01-01 03:00:00
# 4: 2017-01-01 03:30:00
# 5: 2017-01-01 04:30:00

我怀疑您是为此任务而创建{s,c}_row_number的；为此，如果没有这些列，您可以这样做：

sale[commercial, roll = -Inf][order(-c_time)][rowid(s_time) == 1L]

我们用c_time进行反向排序，以确保rowid获得最新的值。

请注意，在两种情况下，is.na(s_time)行之一都已删除。

希望这能使您朝正确的方向前进。

Answer 2

如果对您的商业时间进行了排序，或者您可以对它们进行排序那么您可以将非等额联接与时间偏移的帮助器列配合使用：

library(lubridate)
library(data.table)

ts <- seq(as.POSIXct("2017-01-01", tz = "UTC"),
          as.POSIXct("2017-01-02", tz = "UTC"),
          by = "30 min")

commercial <-
  data.table(
    c_row_number = 1:10,
    c_time       = ts[1:10],
    c_next_time  = shift(ts[1:10], type = "lead", fill = max(ts))
  )

sale <-
  data.table(
    s_row_number = 1:4,
    s_time       = ts[5:8] + minutes(5),
    s_time_join  = ts[5:8] + minutes(5)
  )

tbl_joined <- sale[commercial, on = .(s_time_join >= c_time, s_time_join < c_next_time)]

如果要使用this idiom：

commercial[, s_time := sale[.SD,
                            .(s_time),
                            on = .(s_time_join >= c_time, s_time_join < c_next_time)]]
print(commercial)
    c_row_number              c_time         c_next_time              s_time
 1:            1 2017-01-01 00:00:00 2017-01-01 00:30:00                <NA>
 2:            2 2017-01-01 00:30:00 2017-01-01 01:00:00                <NA>
 3:            3 2017-01-01 01:00:00 2017-01-01 01:30:00                <NA>
 4:            4 2017-01-01 01:30:00 2017-01-01 02:00:00                <NA>
 5:            5 2017-01-01 02:00:00 2017-01-01 02:30:00 2017-01-01 02:05:00
 6:            6 2017-01-01 02:30:00 2017-01-01 03:00:00 2017-01-01 02:35:00
 7:            7 2017-01-01 03:00:00 2017-01-01 03:30:00 2017-01-01 03:05:00
 8:            8 2017-01-01 03:30:00 2017-01-01 04:00:00 2017-01-01 03:35:00
 9:            9 2017-01-01 04:00:00 2017-01-01 04:30:00                <NA>
10:           10 2017-01-01 04:30:00 2017-01-02 00:00:00                <NA>

使用roll in data.table连接，我是否可以严格强制一场比赛

2 个答案: