我有一个包含50k行的表,其中包含A列(BIGINT,示例客户帐户ID)和B列(日期,示例 - 上次购买日期)。
我想了解在最后一次购买中,有多少客户在给定日期范围的前25%瓷砖,前50%瓷砖,75%瓷砖中进行了最后一次购买,因此我可以根据所有这些客户帐户ID来判断我们最近的大部分购买都倾向于。关于如何在sql中实现的任何想法?
表:alltransations
ACCT_ID | DATE
----------------|---------------
23748234782947 | 05-15-2016
28178792839838 | 05-01-2016
28178092734538 | 02-12-2016
28347732839867 | 01-15-2016
28170909362959 | 10-10-2015
28171334099090 | 11-11-2015
28109129330023 | 12-25-2014
28172377859289 | 10-31-2014
答案 0 :(得分:0)
我不确定我是否使用了瓷砖,但是如果你的意思是将时间范围划分为四个区域,那么从2016-02-01到2016-06-01的间隔就会如此。权衡:手动计算间隔;也可以通过日期计算来做到这一点
CREATE TABLE tblA ( ACCT_ID INTEGER, PDATE DATE);
INSERT INTO TblA(ACCT_ID, PDATE) VALUES (1000,'2016-05-21');
INSERT INTO TblA(ACCT_ID, PDATE) VALUES (1001,'2016-05-11');
INSERT INTO TblA(ACCT_ID, PDATE) VALUES (1002,'2016-05-24');
INSERT INTO TblA(ACCT_ID, PDATE) VALUES (1003,'2016-04-21');
INSERT INTO TblA(ACCT_ID, PDATE) VALUES (1004,'2016-02-12');
INSERT INTO TblA(ACCT_ID, PDATE) VALUES (1005,'2016-02-21');
INSERT INTO TblA(ACCT_ID, PDATE) VALUES (1001,'2016-03-22');
INSERT INTO TblA(ACCT_ID, PDATE) VALUES (1002,'2016-04-01');
INSERT INTO TblA(ACCT_ID, PDATE) VALUES (1005,'2016-04-01');
INSERT INTO TblA(ACCT_ID, PDATE) VALUES (1006,'2016-04-01');
SELECT DISTR.DATE_RANGE, COUNT(DISTR.ACCT_ID) / OVRL.TOTALCNT
FROM (SELECT 'TOP25' as DATE_RANGE, A.ACCT_ID
FROM tblA A
WHERE A.PDATE BETWEEN STR_TO_DATE('01.05.2016', '%m/%d/%Y') AND STR_TO_DATE('01.06.2016', '%m/%d/%Y')
UNION ALL
SELECT 'TOP50' as DATE_RANGE, B.ACCT_ID
FROM tblA B
WHERE B.PDATE BETWEEN STR_TO_DATE('01.04.2016', '%m/%d/%Y') AND STR_TO_DATE('01.06.2016', '%m/%d/%Y')
UNION ALL
SELECT 'TOP75' as DATE_RANGE, C.ACCT_ID
FROM tblA C
WHERE C.PDATE BETWEEN STR_TO_DATE('01.03.2016', '%m/%d/%Y') AND STR_TO_DATE('01.06.2016', '%m/%d/%Y')
UNION ALL
SELECT 'ALL' as DATE_RANGE, C.ACCT_ID
FROM tblA C
WHERE C.PDATE BETWEEN STR_TO_DATE('01.02.2016', '%m/%d/%Y') AND STR_TO_DATE('01.06.2016', '%m/%d/%Y') ) DISTR
, (SELECT COUNT(*) AS TOTALCNT FROM tblA A WHERE A.PDATE BETWEEN STR_TO_DATE('01.03.2016', '%m/%d/%Y') AND STR_TO_DATE('01.06.2016', '%m/%d/%Y')) OVRL
GROUP BY DISTR.DATE_RANGE, OVRL.TOTALCNT
将提供
ALL 10 10
TOP25 3 10
TOP50 7 10
TOP75 8 10
答案 1 :(得分:0)
此解决方案将根据数据集的完整日期范围动态创建日期四分位数,然后显示四分位数中出现的ID的百分比:
select unix_timestamp(min(date)) into @start from p;
select unix_timestamp(max(date)) into @end from p;
Set @25 = 0.25 *(@end - @start)+@start;
Set @50 = 0.50 *(@end - @start)+@start;
Set @75 = 0.75 *(@end - @start)+@start;
SELECT
CASE WHEN unix_timestamp(date)>@75 then 4
WHEN unix_timestamp(date)>@50 then 3
WHEN unix_timestamp(date)>@25 then 2
ELSE 1 END as Quartile,
round(count(id)/(select count(*) from p)*100,2) as Percentage
FROM p
GROUP BY Quartile;
Here is a functional example带有更多细节和格式。
如果你的一半日期在你的范围的开头,一半在结尾,你只会动态地看到Q1和Q4。
首先将变量设置为范围,然后为每个Quartile或其他时间段分区进行拆分。
CASE
语句从大到小的日期级联,全部采用UNIX_TIMESTAMP
格式,以便于算术,在失败时从Quartile传递到Quartile。
此相同的结构可用于按分段,n-tiles分割日期范围。