如何根据公式消除两个表之间的观察?

时间:2017-11-23 21:32:46

标签: sql date sas

我有两张桌子:

  • 第一个表包含名称,日期,时间日内价格变量。这意味着特定日期时间中的每个名称都有日内价格
  • 第二个表的名称,日期每日价格每日价格为每个日内价格汇总名称日期

我尝试编写一个执行以下程序的程序:

  1. 它可以通过 name date 在两个表中找到相同的观察结果,然后:

  2. 如果第一个和最后一个日内价格在最后一天超出0.962和每日价格的1.0398倍;然后在表1 中删除与该特定名称日期相关的所有数据。

  3. 陈述是:

    如果第一个也是最后一个(具体名称和日期的日内价格)不是[0.962 *(昨天的每日价格),1.0398 *(昨天的每日价格)]那么删除。

    例如,考虑以下两个表:

    data WORK.TABLE1;
    infile datalines dsd truncover;
    input name:$3. date:DATE9. time:TIME8. intraday_price:32.;
    format date DATE9. time TIME8.;
    label name="name" date="date" time="time" intraday_price="intraday price";
    datalines4;
    A,07MAY2008,11:32:41,3
    A,07MAY2008,12:32:41,2
    A,07MAY2008,13:32:41,1
    A,08MAY2008,11:32:41,3.95
    A,08MAY2008,12:32:41,3
    A,08MAY2008,13:32:41,6
    A,08MAY2008,14:32:41,4.01
    B,07MAY2008,11:32:41,3.1
    B,07MAY2008,12:32:41,1
    B,07MAY2008,13:32:41,4
    B,07MAY2008,14:32:41,2.9
    B,08MAY2008,11:32:41,6
    B,08MAY2008,12:32:41,1
    B,09MAY2008,11:32:41,5
    B,09MAY2008,12:32:41,7
    C,07MAY2008,11:32:41,3
    C,07MAY2008,12:32:41,2
    C,08MAY2008,11:32:41,6.1
    C,08MAY2008,12:32:41,3
    C,08MAY2008,13:32:41,2
    C,09MAY2008,11:32:41,8
    C,09MAY2008,12:32:41,2
    C,09MAY2008,13:32:41,3
    C,09MAY2008,14:32:41,2
    ;;;;
    

    表2是:

    data WORK.TABLE2;
    infile datalines dsd truncover;
    input name:$3. date:DATE9. daily_price:32.;
    format date DATE9.;
    label name="name" date="date" daily_price="daily price";
    datalines4;
    A,05MAY2008,3
    B,05MAY2008,6
    C,05MAY2008,5
    A,06MAY2008,5
    A,07MAY2008,4
    B,06MAY2008,3
    B,07MAY2008,4
    B,08MAY2008,3
    C,06MAY2008,7
    C,07MAY2008,6
    C,08MAY2008,5
    ;;;;
    

    请考虑在公式中使用昨天的每日价格。

    结果是:

    +------+----------+----------+----------------+
    | name |   date   |   time   | intraday price |
    +------+----------+----------+----------------+
    | B    | 7-May-08 | 11:32:41 |            3.1 |
    | B    | 7-May-08 | 12:32:41 |              1 |
    | B    | 7-May-08 | 13:32:41 |              4 |
    | B    | 7-May-08 | 14:32:41 |            2.9 |
    | A    | 8-May-08 | 11:32:41 |           3.95 |
    | A    | 8-May-08 | 12:32:41 |              3 |
    | A    | 8-May-08 | 13:32:41 |              6 |
    | A    | 8-May-08 | 14:32:41 |           4.01 |
    | C    | 8-May-08 | 11:32:41 |            6.1 |
    | C    | 8-May-08 | 12:32:41 |              3 |
    | C    | 8-May-08 | 13:32:41 |              2 |
    +------+----------+----------+----------------+
    
    你能告诉我怎么做吗?

    提前致谢。

3 个答案:

答案 0 :(得分:0)

这将识别您不想要的行:

    select t1.*
    from table1 t1
    join table2 t2 on t1.name = t2.name and t1.date = t2.date
    where (t1.intraday_price < (t2.daily_price*0.962)
       or t1.intraday_price > (t2.daily_price*1.0398)
          )

如果将其放在子查询中,然后在该子查询中测试EXISTS,则表示您正在识别不需要的行。

演示于:SQL Fiddle

CREATE TABLE Table1
    ([name] varchar(1), [date] datetime, [time] varchar(8), [intraday_price] decimal(12,2))
;

INSERT INTO Table1
    ([name], [date], [time], [intraday_price])
VALUES
    ('A', '2008-05-07 00:00:00', '11:32:41', 3),
    ('A', '2008-05-07 00:00:00', '12:32:41', 2),
    ('A', '2008-05-07 00:00:00', '13:32:41', 1),
    ('A', '2008-05-08 00:00:00', '11:32:41', 3.95),
    ('A', '2008-05-08 00:00:00', '12:32:41', 3),
    ('A', '2008-05-08 00:00:00', '13:32:41', 6),
    ('A', '2008-05-08 00:00:00', '14:32:41', 4.01),
    ('B', '2008-05-07 00:00:00', '11:32:41', 3.1),
    ('B', '2008-05-07 00:00:00', '12:32:41', 1),
    ('B', '2008-05-07 00:00:00', '13:32:41', 4),
    ('B', '2008-05-07 00:00:00', '14:32:41', 2.9),
    ('B', '2008-05-08 00:00:00', '11:32:41', 6),
    ('B', '2008-05-08 00:00:00', '12:32:41', 1),
    ('B', '2008-05-09 00:00:00', '11:32:41', 5),
    ('B', '2008-05-09 00:00:00', '12:32:41', 7),
    ('C', '2008-05-07 00:00:00', '11:32:41', 3),
    ('C', '2008-05-07 00:00:00', '12:32:41', 2),
    ('C', '2008-05-08 00:00:00', '11:32:41', 6.1),
    ('C', '2008-05-08 00:00:00', '12:32:41', 3),
    ('C', '2008-05-08 00:00:00', '13:32:41', 2),
    ('C', '2008-05-09 00:00:00', '11:32:41', 8),
    ('C', '2008-05-09 00:00:00', '12:32:41', 2),
    ('C', '2008-05-09 00:00:00', '13:32:41', 3),
    ('C', '2008-05-09 00:00:00', '14:32:41', 2)
;



CREATE TABLE Table2
    ([name] varchar(1), [date] datetime, [daily_price] decimal(12,2))
;

INSERT INTO Table2
    ([name], [date], [daily_price])
VALUES
    ('A', '2008-05-05 00:00:00', 3),
    ('B', '2008-05-05 00:00:00', 6),
    ('C', '2008-05-05 00:00:00', 5),
    ('A', '2008-05-06 00:00:00', 5),
    ('A', '2008-05-07 00:00:00', 4),
    ('B', '2008-05-06 00:00:00', 3),
    ('B', '2008-05-07 00:00:00', 4),
    ('B', '2008-05-08 00:00:00', 3),
    ('C', '2008-05-06 00:00:00', 7),
    ('C', '2008-05-07 00:00:00', 6),
    ('C', '2008-05-08 00:00:00', 5)
;

查询1

with cte as (
  select
        *
  from Table1
  where exists (
    select NULL
    from table1 t1
    join table2 t2 on t1.name = t2.name and t1.date = t2.date
    where (t1.intraday_price < (t2.daily_price*0.962)
       or t1.intraday_price > (t2.daily_price*1.0398)
          )
    and table1.name = t1.name and table1.date = t1.date and table1.time = t1.time
    )
  )
delete
from cte
;

select * from table1

<强> Results

| name |                 date |     time | intraday_price |
|------|----------------------|----------|----------------|
|    A | 2008-05-08T00:00:00Z | 11:32:41 |           3.95 |
|    A | 2008-05-08T00:00:00Z | 12:32:41 |              3 |
|    A | 2008-05-08T00:00:00Z | 13:32:41 |              6 |
|    A | 2008-05-08T00:00:00Z | 14:32:41 |           4.01 |
|    B | 2008-05-07T00:00:00Z | 13:32:41 |              4 |
|    B | 2008-05-09T00:00:00Z | 11:32:41 |              5 |
|    B | 2008-05-09T00:00:00Z | 12:32:41 |              7 |
|    C | 2008-05-09T00:00:00Z | 11:32:41 |              8 |
|    C | 2008-05-09T00:00:00Z | 12:32:41 |              2 |
|    C | 2008-05-09T00:00:00Z | 13:32:41 |              3 |
|    C | 2008-05-09T00:00:00Z | 14:32:41 |              2 |

答案 1 :(得分:0)

不是从源表中删除,而是创建为所需记录筛选的新数据集。具体来说,考虑一个exists子查询,根据需要的逻辑选择记录。

下面使用 table1 上的自联接来将min和max time 记录对齐在同一名称日期如果它们落在 price 范围内,则将 intraday_price 划分为一个结果集。

proc sql;
   create table newtable as

   select *
   from work.table1 main

   where exists(
     select 1 
     from work.table1 m1    

     inner join work.table1 m2
       on m1.name = m2.name and m1.date = m2.date

     inner join work.table2 t2
       on m1.name = t2.name and m1.date = intnx("day", t2.date, -1) 

     inner join
       (select t.name, t.date, min(t.time) as min_time, max(t.time) as max_time
        from work.table1 t
        group by t.name, t.date
       ) agg
        on m1.name = agg.name and m1.date = agg.date 
        and m1.time = agg.min_time and m2.time = agg.max_time

     where m1.intraday_price between (0.962 * t2.daily_price) and (1.0398 * t2.daily_price)
       and m2.intraday_price between (0.962 * t2.daily_price) and (1.0398 * t2.daily_price)

       and main.name = m1.name and main.date = m1.date);
quit;

答案 2 :(得分:0)

根据Shmuel和KurtBremser在SAS社区的工作,结果是:

proc sort data=table1; by name date time; run;

proc sort data=table2; by name date; run;

proc sql;
 create table table3 as
 select * from table1, table2
 where table1.name=table2.name and table1.date=table2.date;
quit;

data table2_new;
 set table2;
 by name;
 /* save price of yesterday */
 lag_Price = lag(Price);
 if first.name then lag_Price = .;
run;

data to_delete(keep = name date);
merge table3 (in=in1) 
 table2_new (in=in2);
 by name date;
 retain start_price last_price;

 if in1 and in2; /* deal with obs on both tables only */
 if first.date then start_price = intradayprice;
 if last.date then last_price = intradayprice;
 if last.date then do; 
 min_price = 0.962 * lag_Price;
 max_price = 1.0398 * lag_Price;
 if not (min_price le start_price le max_price) and not (min_price le last_price le max_price)
 then output; 
 end;
run;

data want;
merge table3 /* table2 */
 to_delete (in=indel);
 by name date;
 if not indel;
run;

SAS Community