我有两张桌子:
我尝试编写一个执行以下程序的程序:
它可以通过 name 和 date 在两个表中找到相同的观察结果,然后:
如果第一个和最后一个日内价格在最后一天超出0.962和每日价格的1.0398倍;然后在表1 中删除与该特定名称和日期相关的所有数据。
陈述是:
如果第一个也是最后一个(具体名称和日期的日内价格)不是[0.962 *(昨天的每日价格),1.0398 *(昨天的每日价格)]那么删除。
例如,考虑以下两个表:
data WORK.TABLE1;
infile datalines dsd truncover;
input name:$3. date:DATE9. time:TIME8. intraday_price:32.;
format date DATE9. time TIME8.;
label name="name" date="date" time="time" intraday_price="intraday price";
datalines4;
A,07MAY2008,11:32:41,3
A,07MAY2008,12:32:41,2
A,07MAY2008,13:32:41,1
A,08MAY2008,11:32:41,3.95
A,08MAY2008,12:32:41,3
A,08MAY2008,13:32:41,6
A,08MAY2008,14:32:41,4.01
B,07MAY2008,11:32:41,3.1
B,07MAY2008,12:32:41,1
B,07MAY2008,13:32:41,4
B,07MAY2008,14:32:41,2.9
B,08MAY2008,11:32:41,6
B,08MAY2008,12:32:41,1
B,09MAY2008,11:32:41,5
B,09MAY2008,12:32:41,7
C,07MAY2008,11:32:41,3
C,07MAY2008,12:32:41,2
C,08MAY2008,11:32:41,6.1
C,08MAY2008,12:32:41,3
C,08MAY2008,13:32:41,2
C,09MAY2008,11:32:41,8
C,09MAY2008,12:32:41,2
C,09MAY2008,13:32:41,3
C,09MAY2008,14:32:41,2
;;;;
表2是:
data WORK.TABLE2;
infile datalines dsd truncover;
input name:$3. date:DATE9. daily_price:32.;
format date DATE9.;
label name="name" date="date" daily_price="daily price";
datalines4;
A,05MAY2008,3
B,05MAY2008,6
C,05MAY2008,5
A,06MAY2008,5
A,07MAY2008,4
B,06MAY2008,3
B,07MAY2008,4
B,08MAY2008,3
C,06MAY2008,7
C,07MAY2008,6
C,08MAY2008,5
;;;;
请考虑在公式中使用昨天的每日价格。
结果是:
+------+----------+----------+----------------+
| name | date | time | intraday price |
+------+----------+----------+----------------+
| B | 7-May-08 | 11:32:41 | 3.1 |
| B | 7-May-08 | 12:32:41 | 1 |
| B | 7-May-08 | 13:32:41 | 4 |
| B | 7-May-08 | 14:32:41 | 2.9 |
| A | 8-May-08 | 11:32:41 | 3.95 |
| A | 8-May-08 | 12:32:41 | 3 |
| A | 8-May-08 | 13:32:41 | 6 |
| A | 8-May-08 | 14:32:41 | 4.01 |
| C | 8-May-08 | 11:32:41 | 6.1 |
| C | 8-May-08 | 12:32:41 | 3 |
| C | 8-May-08 | 13:32:41 | 2 |
+------+----------+----------+----------------+
你能告诉我怎么做吗?
提前致谢。
答案 0 :(得分:0)
这将识别您不想要的行:
select t1.*
from table1 t1
join table2 t2 on t1.name = t2.name and t1.date = t2.date
where (t1.intraday_price < (t2.daily_price*0.962)
or t1.intraday_price > (t2.daily_price*1.0398)
)
如果将其放在子查询中,然后在该子查询中测试EXISTS,则表示您正在识别不需要的行。
演示于:SQL Fiddle
CREATE TABLE Table1
([name] varchar(1), [date] datetime, [time] varchar(8), [intraday_price] decimal(12,2))
;
INSERT INTO Table1
([name], [date], [time], [intraday_price])
VALUES
('A', '2008-05-07 00:00:00', '11:32:41', 3),
('A', '2008-05-07 00:00:00', '12:32:41', 2),
('A', '2008-05-07 00:00:00', '13:32:41', 1),
('A', '2008-05-08 00:00:00', '11:32:41', 3.95),
('A', '2008-05-08 00:00:00', '12:32:41', 3),
('A', '2008-05-08 00:00:00', '13:32:41', 6),
('A', '2008-05-08 00:00:00', '14:32:41', 4.01),
('B', '2008-05-07 00:00:00', '11:32:41', 3.1),
('B', '2008-05-07 00:00:00', '12:32:41', 1),
('B', '2008-05-07 00:00:00', '13:32:41', 4),
('B', '2008-05-07 00:00:00', '14:32:41', 2.9),
('B', '2008-05-08 00:00:00', '11:32:41', 6),
('B', '2008-05-08 00:00:00', '12:32:41', 1),
('B', '2008-05-09 00:00:00', '11:32:41', 5),
('B', '2008-05-09 00:00:00', '12:32:41', 7),
('C', '2008-05-07 00:00:00', '11:32:41', 3),
('C', '2008-05-07 00:00:00', '12:32:41', 2),
('C', '2008-05-08 00:00:00', '11:32:41', 6.1),
('C', '2008-05-08 00:00:00', '12:32:41', 3),
('C', '2008-05-08 00:00:00', '13:32:41', 2),
('C', '2008-05-09 00:00:00', '11:32:41', 8),
('C', '2008-05-09 00:00:00', '12:32:41', 2),
('C', '2008-05-09 00:00:00', '13:32:41', 3),
('C', '2008-05-09 00:00:00', '14:32:41', 2)
;
CREATE TABLE Table2
([name] varchar(1), [date] datetime, [daily_price] decimal(12,2))
;
INSERT INTO Table2
([name], [date], [daily_price])
VALUES
('A', '2008-05-05 00:00:00', 3),
('B', '2008-05-05 00:00:00', 6),
('C', '2008-05-05 00:00:00', 5),
('A', '2008-05-06 00:00:00', 5),
('A', '2008-05-07 00:00:00', 4),
('B', '2008-05-06 00:00:00', 3),
('B', '2008-05-07 00:00:00', 4),
('B', '2008-05-08 00:00:00', 3),
('C', '2008-05-06 00:00:00', 7),
('C', '2008-05-07 00:00:00', 6),
('C', '2008-05-08 00:00:00', 5)
;
查询1 :
with cte as (
select
*
from Table1
where exists (
select NULL
from table1 t1
join table2 t2 on t1.name = t2.name and t1.date = t2.date
where (t1.intraday_price < (t2.daily_price*0.962)
or t1.intraday_price > (t2.daily_price*1.0398)
)
and table1.name = t1.name and table1.date = t1.date and table1.time = t1.time
)
)
delete
from cte
;
select * from table1
<强> Results 强>:
| name | date | time | intraday_price |
|------|----------------------|----------|----------------|
| A | 2008-05-08T00:00:00Z | 11:32:41 | 3.95 |
| A | 2008-05-08T00:00:00Z | 12:32:41 | 3 |
| A | 2008-05-08T00:00:00Z | 13:32:41 | 6 |
| A | 2008-05-08T00:00:00Z | 14:32:41 | 4.01 |
| B | 2008-05-07T00:00:00Z | 13:32:41 | 4 |
| B | 2008-05-09T00:00:00Z | 11:32:41 | 5 |
| B | 2008-05-09T00:00:00Z | 12:32:41 | 7 |
| C | 2008-05-09T00:00:00Z | 11:32:41 | 8 |
| C | 2008-05-09T00:00:00Z | 12:32:41 | 2 |
| C | 2008-05-09T00:00:00Z | 13:32:41 | 3 |
| C | 2008-05-09T00:00:00Z | 14:32:41 | 2 |
答案 1 :(得分:0)
不是从源表中删除,而是创建为所需记录筛选的新数据集。具体来说,考虑一个exists
子查询,根据需要的逻辑选择记录。
下面使用 table1 上的自联接来将min和max time 记录对齐在同一名称和日期如果它们落在 price 范围内,则将 intraday_price 划分为一个结果集。
proc sql;
create table newtable as
select *
from work.table1 main
where exists(
select 1
from work.table1 m1
inner join work.table1 m2
on m1.name = m2.name and m1.date = m2.date
inner join work.table2 t2
on m1.name = t2.name and m1.date = intnx("day", t2.date, -1)
inner join
(select t.name, t.date, min(t.time) as min_time, max(t.time) as max_time
from work.table1 t
group by t.name, t.date
) agg
on m1.name = agg.name and m1.date = agg.date
and m1.time = agg.min_time and m2.time = agg.max_time
where m1.intraday_price between (0.962 * t2.daily_price) and (1.0398 * t2.daily_price)
and m2.intraday_price between (0.962 * t2.daily_price) and (1.0398 * t2.daily_price)
and main.name = m1.name and main.date = m1.date);
quit;
答案 2 :(得分:0)
根据Shmuel和KurtBremser在SAS社区的工作,结果是:
proc sort data=table1; by name date time; run;
proc sort data=table2; by name date; run;
proc sql;
create table table3 as
select * from table1, table2
where table1.name=table2.name and table1.date=table2.date;
quit;
data table2_new;
set table2;
by name;
/* save price of yesterday */
lag_Price = lag(Price);
if first.name then lag_Price = .;
run;
data to_delete(keep = name date);
merge table3 (in=in1)
table2_new (in=in2);
by name date;
retain start_price last_price;
if in1 and in2; /* deal with obs on both tables only */
if first.date then start_price = intradayprice;
if last.date then last_price = intradayprice;
if last.date then do;
min_price = 0.962 * lag_Price;
max_price = 1.0398 * lag_Price;
if not (min_price le start_price le max_price) and not (min_price le last_price le max_price)
then output;
end;
run;
data want;
merge table3 /* table2 */
to_delete (in=indel);
by name date;
if not indel;
run;