如何在非等连接的结果中确定订单?

时间:2016-12-02 12:26:13

标签: r join data.table

我正在尝试理解import UIKit postfix operator % postfix func % (percentage: Int) -> Double { return (Double(percentage) / 100) } class ViewController: UIViewController { var percentage = Double() override func viewDidLoad() { super.viewDidLoad() percentage = 25% print(percentage) } } 中非等同连接的结果如何在data.table - 变量的每个级别中中排序的基础逻辑。

从头开始说清楚:我对订单本身没有任何问题,或者在加入后以所需的方式订购输出。但是,因为我发现所有其他on操作的输出高度一致,我怀疑在非equi连接中也有一个排序模式。

我将举两个例子,其中两个不同的“大”数据集以较小的方式连接。我试图在每个连接的输出中描述最明显的模式,以及模式在两个数据集的连接之间的实例。

data.table

第一个大数据集与小library(data.table) # the first 'large' data set d1 <- data.table(x = c(rep(c("b", "a", "c"), each = 3), c("a", "b")), y = c(rep(c(1, 3, 6), 3), 6, 6), id = 1:11) # to make it easier to track the original order in the output # x y id # 1: b 1 1 # 2: b 3 2 # 3: b 6 3 # 4: a 1 4 # 5: a 3 5 # 6: a 6 6 # 7: c 1 7 # 8: c 3 8 # 9: c 6 9 # 10: a 6 10 # 11: b 6 11 # the small data set d2 <- data.table(id = 1:2, val = c(4, 2)) # id val # 1: 1 4 # 2: 2 2 之间的非等连接。

on = .(y >= val)

第二个“大”数据集:

d1[d2, on = .(y >= val)]
#     x y  id  i.id
# 1:  b 4   3     1 # Row 1-5, first match: y >= val[1]; y >= 4
# 2:  a 4   6     1 # The rows within this match have the same order as the original data
# 3:  c 4   9     1 # and runs consecutively from first to last match
# 4:  a 4  10     1
# 5:  b 4  11     1

# 6:  b 2   2     2 # Row 6-13, second match: y >= val[2]; y >= 2 
# 7:  a 2   5     2 # The rows within this match do not have the same order as the original data
# 8:  c 2   8     2 # Rather, they seem to be come in chunks (6-8, 9-11, 12-13) 
                    # First chunk starts with the match with lowest index, y[2] 
# 9:  b 2   3     2  
# 10: a 2   6     2 
# 11: c 2   9     2 

# 12: a 2  10     2
# 13: b 2  11     2

第二个大数据集与小:

之间的非等连接
d3 <- data.table(x = rep(c("a", "b", "c"), each = 3),
                 y = c(6, 1, 3),
                 id = 1:9)
#    x y id
# 1: a 6  1
# 2: a 1  2
# 3: a 3  3
# 4: b 6  4
# 5: b 1  5
# 6: b 3  6
# 7: c 6  7
# 8: c 1  8
# 9: c 3  9

任何人都可以解释(1)d3[d2, on = .(y >= val)] # x y id i.id # 1: a 4 1 1 # Row 1-3, first match (y >= 4), similar to output above # 2: b 4 4 1 # 3: c 4 7 1 # 4: a 2 3 2 # Row 4-9, second match (y >= 2). # 5: b 2 6 2 # Again, rows not consecutive. # 6: c 2 9 2 # However, now the first chunk does not start with the match with lowest index, # y[3] instead of y[1] # 7: a 2 1 2 # y[1] appears after y[3] # 8: b 2 4 2 # ditto # 9: c 2 7 2 变量的每个级别中的顺序的逻辑,特别是在 second 匹配中,其中原始数据的顺序不保留在结果中。 (2)当使用两个不同的数据集时,为什么块中的

1 个答案:

答案 0 :(得分:7)

感谢您抓住这个并在此处报告SO,并在Github上归档。这在当前的开发版本中应该是fixed now(在撰写本文时为v1.10.5)。

应该很快就可以在CRAN上使用v1.10.6。

来自NEWS条目:

  
      
  1. #1991下报告的某些情况下,非equi联接中返回的行顺序不正确。这已经修复了。感谢@ Henrik-P报道。
  2.