我有两个dfs,如下所示
>codes1
Country State City Start No End No
IN Telangana Hyderabad 100 200
IN Maharashtra Pune (Bund Garden) 300 400
IN Haryana Gurgaon 500 600
IN Maharashtra Pune 700 800
IN Gujarat Ahmedabad (Vastrapur) 900 1000
现在我要标记表1中的IP地址
>codes2
ID No
1 157
2 346
3 389
4 453
5 562
6 9874
7 98745
现在我想根据codes2
df中给出的无列标记codes1
df中的数字,预期输出为
ID No Country State City
1 157 IN Telangana Hyderabad
2 346 IN Maharashtra Pune(Bund Garden)
.
.
.
基本上希望根据No
观察到的codes 2
观察范围(codes1
和Start No
)在End No
中使用No
标记codes 2
列英寸
此订单也可以是%macro do_every_hour;
%do %while(1); /*Does not end, so be sure what you do....*/
data time; /*When the loop begins, stored to dataset*/
begin=datetime();
run;
%do_the_queries; /*Your own queries go here*/
data time; /*How long did the queries take. */
set time;
end=datetime();
time_remain=(60-(end-begin)) <>0 ; /*Calculate the time for sleep if you want every hour. Make sure there are no negative values. */
call symput("sleep_time", time_remain); /*Take the number to macro variable for clarity's sake.*/
run;
%sysfunc(sleep(&sleep_time)); /*Here we wait for the next round.*/
%end; /*Do loop end.*/
%mend do_every_hour;
%do_every_hour;
df。
答案 0 :(得分:6)
您可以使用data.table
包的非等连接功能:
library(data.table)
setDT(codes1)
setDT(codes2)
codes2[codes1, on = .(No > StartNo, No < EndNo), ## (1)
`:=`(cntry = Country, state = State, city = City)] ## (2)
(1)获得与codes2
中每行对应的codes1
中的匹配行索引,同时匹配提供给on
参数的条件。
(2)为引用指定的列的匹配行更新 codes2
值(即,您不必分配结果回到另一个变量)。
这给出了:
codes2
# ID No cntry state city
# 1: 1 157 IN Telangana Hyderabad
# 2: 2 346 IN Maharashtra Pune (Bund Garden)
# 3: 3 389 IN Maharashtra Pune (Bund Garden)
# 4: 4 453 NA NA NA
# 5: 5 562 IN Haryana Gurgaon
# 6: 6 9874 NA NA NA
# 7: 7 98745 NA NA NA
答案 1 :(得分:3)
如果您习惯编写SQL,可以考虑使用sqldf包来执行类似
的操作library('sqldf')
result <- sqldf('select * from codes2 left join codes1 on codes2.No between codes1.StartNo and codes1.EndNo')
您可能需要事先从数据框的列名中删除特殊字符和空格。