查找并标记范围

时间:2017-04-03 08:49:58

标签: r

我有两个dfs,如下所示

>codes1

Country       State                       City  Start No    End No
IN          Telangana                Hyderabad    100        200
IN        Maharashtra       Pune (Bund Garden)    300        400
IN            Haryana                  Gurgaon    500        600
IN        Maharashtra                     Pune    700        800
IN            Gujarat    Ahmedabad (Vastrapur)    900        1000

现在我要标记表1中的IP地址

>codes2

ID     No
1      157
2      346
3      389
4      453
5      562
6      9874
7      98745

现在我想根据codes2 df中给出的无列标记codes1 df中的数字,预期输出为

ID     No    Country     State          City
1     157       IN      Telangana     Hyderabad
2     346       IN     Maharashtra   Pune(Bund Garden)
.
.
. 

基本上希望根据No观察到的codes 2观察范围(codes1Start No)在End No中使用No标记codes 2列英寸

此订单也可以是%macro do_every_hour; %do %while(1); /*Does not end, so be sure what you do....*/ data time; /*When the loop begins, stored to dataset*/ begin=datetime(); run; %do_the_queries; /*Your own queries go here*/ data time; /*How long did the queries take. */ set time; end=datetime(); time_remain=(60-(end-begin)) <>0 ; /*Calculate the time for sleep if you want every hour. Make sure there are no negative values. */ call symput("sleep_time", time_remain); /*Take the number to macro variable for clarity's sake.*/ run; %sysfunc(sleep(&sleep_time)); /*Here we wait for the next round.*/ %end; /*Do loop end.*/ %mend do_every_hour; %do_every_hour; df。

中的任何内容

2 个答案:

答案 0 :(得分:6)

您可以使用data.table包的非等连接功能:

library(data.table)
setDT(codes1)
setDT(codes2)

codes2[codes1, on = .(No > StartNo, No < EndNo),          ## (1)
       `:=`(cntry = Country, state = State, city = City)] ## (2)

(1)获得与codes2中每行对应的codes1中的匹配行索引,同时匹配提供给on参数的条件。

(2)引用指定的列的匹配行更新 codes2值(即,您不必分配结果回到另一个变量)。

这给出了:

codes2
#    ID    No  cntry       state               city
# 1:  1   157     IN   Telangana          Hyderabad
# 2:  2   346     IN Maharashtra Pune (Bund Garden)
# 3:  3   389     IN Maharashtra Pune (Bund Garden)
# 4:  4   453     NA          NA                 NA
# 5:  5   562     IN     Haryana            Gurgaon
# 6:  6  9874     NA          NA                 NA
# 7:  7 98745     NA          NA                 NA

答案 1 :(得分:3)

如果您习惯编写SQL,可以考虑使用sqldf包来执行类似

的操作
library('sqldf')
result <- sqldf('select * from codes2 left join codes1 on codes2.No between codes1.StartNo and codes1.EndNo')

您可能需要事先从数据框的列名中删除特殊字符和空格。