data.table由子集NAs连接

时间:2017-10-22 04:46:01

标签: r data.table

这是一个来自我偶然遇到的早期线程的查询,两个表DT1和DT2

    DT1
  Country       State      City Start  End
1      IN   Telangana Hyderabad   100  200
2      IN Maharashtra      Pune   300  400
3      IN     Haryana   Gurgaon   500  600
4      IN Maharashtra      Pune   700  800
5      IN     Gujarat Ahmedabad   900 1000

DT2 with 7 rows
ID  No
1   157
2   346
3   389
4   453
5   562
6   9874
7   98745

使用此代码加入时,

DT2[DT1, on=.(No>Start,No<End), ]

生成此输出,包含6行

   ID  No No.1 Country       State      City
1:  1 100  200      IN   Telangana Hyderabad
2:  2 300  400      IN Maharashtra      Pune
3:  3 300  400      IN Maharashtra      Pune
4:  5 500  600      IN     Haryana   Gurgaon
5: NA 700  800      IN Maharashtra      Pune
6: NA 900 1000      IN     Gujarat Ahmedabad

我可以理解对应于ID 6和7(rownumbers 5和6)的NA,但是为什么缺少对应于ID 4的NA。 ID4有453否,映射到DT1中没有范围,应该抛出NA?

EDIT1:提供代码来创建数据集

DT1<-
structure(list(Country = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "IN", class = "factor"), 
    State = structure(c(4L, 3L, 2L, 3L, 1L), .Label = c("Gujarat", 
    "Haryana", "Maharashtra", "Telangana"), class = "factor"), 
    City = structure(c(3L, 4L, 2L, 4L, 1L), .Label = c("Ahmedabad", 
    "Gurgaon", "Hyderabad", "Pune"), class = "factor"), Start = c(100L, 
    300L, 500L, 700L, 900L), End = c(200L, 400L, 600L, 800L, 
    1000L)), .Names = c("Country", "State", "City", "Start", 
"End"), class = c("data.table", "data.frame"))
DT2<-
structure(list(ID = 1:7, No = c(157L, 346L, 389L, 453L, 562L, 
9874L, 98745L)), .Names = c("ID", "No"), class = c("data.table", 
"data.frame"))
  

0 个答案:

没有答案