在日期范围之间扩展3000万条数据记录

时间:2018-12-05 15:02:32

标签: sql sas bigdata qlikview

我有3000万条记录(贷款),数据范围为(FROM,TO),我需要为日期范围之间的每个日期创建虚拟记录。

示例数据:

BALANCE   EFF_FROM_DT    EFF_TO_DT   LOAN_NBR    PAST_DUE_DT
1000       11/1/2018     11/29/2018     1234      10/29/2018

输出数据:

BALANCE   Date      EFF_FROM_DT    EFF_TO_DT   LOAN_NBR    PAST_DUE_DT  DPD
1000     11/1/2018  11/1/2018     11/29/2018     1234      10/29/2018    2
1000     11/2/2018  11/1/2018     11/29/2018     1234      10/29/2018    3
1000     11/3/2018  11/1/2018     11/29/2018     1234      10/29/2018    4
 .
 .
 .
 .
1000     11/29/2018 11/1/2018     11/29/2018     1234      10/29/2018    30

我需要将其放在仪表板中,并能够使用其他维度(例如信用等级等)对数据进行切片,以查看每日过期百分比。我已经开始在Qlikview中执行此操作,它从Netezza提取数据并使用以下脚本在QV内扩展数据。加载2700万条记录(仅过去12个月)并将其扩展到每日记录(3.6亿条记录)需要一个小时。理想情况下,我希望获取此数据超过12个月(至少3年)才能看到趋势,在这种情况下,使用QV会花费太多时间来处理数据。还有其他解决方案吗?可以减少处理时间并让我每天漂洗并重复此过程的能力?

LOAN_HIST:
LOAD BALANCE, 
     EFF_FROM_DT, 
     EFF_TO_DT, 
     LOAN_NBR, 
     PASTDUE,
     Grade
FROM
[D:\QVDOCS\DEV\SOURCE\SHF416749\Examples\Test_data.xls]
(biff, embedded labels, table is Sheet1$);



LOAN_HIST2:
LOAD
*,
Date(EFF_FROM_DT + IterNo() - 1) As Date
While EFF_FROM_DT + IterNo() - 1 <= EFF_TO_DT
;
LOAD *
Resident LOAN_HIST order by LOAN_NBR,EFF_FROM_DT;
drop table LOAN_HIST;

LOAN_HIST3:
load
*,
day(Date) as DayOfMonth,
Date(monthstart(Date), 'MMM-YY') as MonthYear,
((year(Date)*12)+month(Date)) - (((year(PASTDUE)*12)+month(PASTDUE))) as MonthDiff
resident LOAN_HIST2;
drop table LOAN_HIST2;

日历表方法:

DatesData:
LOAD * Inline [ 
Test_Date
   11/1/2018
    11/2/2018
    11/3/2018
    11/4/2018
    11/5/2018
    11/6/2018
    11/7/2018
    11/8/2018
    11/9/2018
    11/10/2018
    11/11/2018
    11/12/2018
    11/13/2018
    11/14/2018
    11/15/2018
    11/16/2018
    11/17/2018
    11/18/2018
    11/19/2018
    11/20/2018
    11/21/2018
    11/22/2018
    11/23/2018
    11/24/2018
    11/25/2018
    11/26/2018
    11/27/2018
    11/28/2018
    11/29/2018
    11/30/2018
    12/1/2018
    12/2/2018
    12/3/2018

];


ODBC CONNECT TO [NTZ PRD] (XUserId is KbRXeRZGZJMSDZIR, XPassword is DFOcWHZMJDZAUYAHUD);

LOAN_HIST:
SQL SELECT 
EFF_FROM_DT,
EFF_TO_DT,
BALANCE,
BRACCT,
PASTDUE
FROM PSAPROD.PSADDS."SHF_DLY_CORE_HSTRY" where 
((EFF_FROM_DT >=TO_DATE('$(Today_Date_12mons)','DD-MON-YY') and EFF_FROM_DT <=TO_DATE('$(Today_Date)','DD-MON-YY'))
or
(EFF_TO_DT >=TO_DATE('$(Today_Date_12mons)','DD-MON-YY') and EFF_TO_DT <=TO_DATE('$(Today_Date)','DD-MON-YY'))
or
(EFF_TO_DT >=TO_DATE('31-DEC-9999','DD-MON-YYYY'))) and BALANCE>0
order by BRACCT,EFF_FROM_DT
;

LOAN_HIST2:
LOAD *,
if(EFF_TO_DT='12/31/9999',if(BALANCE=0, EFF_FROM_DT, date(today())),if(BALANCE=0,EFF_FROM_DT,EFF_TO_DT)) as EFF_TO_DT2

Resident LOAN_HIST order by BRACCT,EFF_FROM_DT;
drop table LOAN_HIST;


tabMatch:
IntervalMatch (Test_Date)
LOAD EFF_FROM_DT, EFF_TO_DT2
Resident LOAN_HIST2;

1 个答案:

答案 0 :(得分:0)

您是否尝试过基于将数据与日历表连接在一起的视图创建仪表板?

此示例为SAS SQL,对于Netezza则略有不同

data have;
attrib 
  id balance length=8
  from_date to_date due_date format=mmddyy10. informat=mmddyy10.
;input 
balance from_date: mmddyy10. to_date: mmddyy10. id due_date: mmddyy10. ; datalines;
500        01/1/2018       2/1/2018     1234      1/15/2018
1000       11/1/2018     11/29/2018     1234     10/29/2018
1500       02/1/2018      3/15/2018     7890      1/15/2018
21000      10/1/2018     11/12/2018     7890      9/30/2018
run;

data calendar;
  do date = mdy(1,1,2018) to mdy(12,31,2018);
    output;
  end;
run;

proc sql;
  create view want_view_for_dashboard as
  select 
    have.*
  , calendar.date as as_of_date format mmddyy10.
  , case 
      when date > due_date then date-due_date /* or DB datediff function */
    end as days_past_due
  from
    have
  cross join
    calendar
  where
    calendar.date between have.from_date and have.to_date
  ;
quit;