我有一个表格格式如下:
| user_name | date | number_of_visits |
| cat005 | 2015-06-03 | 5 |
| cat005 | 2015-06-08 | 1 |
| dog009 | 2015-06-01 | 7 |
| dog009 | 2015-06-19 | 2 |
因此,对于每个用户,我都有给定日期的访问次数。如果给定用户在给定日期没有访问,则数据库中没有记录(换句话说,如果它等于零,则不save number_of_visits
。
现在我想使用这个表生成另一个表,其中每个用户都有活动范围。在这里,我们使用以下活跃定义:如果用户在过去10天内至少进行过一次访问,则在某一天被视为“活动”。所以,我想有类似的东西:
| user_name | active_start | active_end |
| cat005 | 2015-03-02 | 2015-03-25 |
| cat005 | 2015-03-29 | 2015-06-01 |
| dog009 | 2015-04-01 | 2015-06-01 |
请注意,两个示例中的数据不一致。根据使用的定义active_end
是独占的(这意味着用户在此日期没有访问)。例如,上表中的第一行表示用户在2015-03-02
处于活动状态(他/她至少进行了一次访问)。也可以说在2015-03-01
(前一天)这个用户没有活动,这反过来意味着他/她在10天内没有访问。也可以说在2015-03-25
使用时没有访问,并且是第11天没有访问(因此,用户被系统“切换”为非活动状态)。
如何使用SQL生成第二个表。
答案 0 :(得分:0)
这有点棘手。一种方法是确定活动期开始的位置。然后使用以前活动的累积总和数天。该累积和提供了聚合标准。
以下标识活动日期:
select t.*,
(select t.*,
(case when date > lag(date) over (partition by user_name order by date) + 10 -- date arithmetic varies by database
then 1
else 0
end) as StartPeriodFlag
from table t;
然后累积金额提供分组所需的信息:
with t as (
select t.*,
(select t.*,
(case when date > lag(date) over (partition by user_name order by date) + 10 -- date arithmetic varies by database
then 1
else 0
end) as StartPeriodFlag
from table t
)
select user_name, min(date) as startdate,
max(date) + 10 as enddate
from (select t.*,
sum(StartPeriodFlag) over (partition by user_name order by date) as grp
from t
) t
group by user_name, grp;
如上面的评论所述,日期算术因数据库而异。这使用简单的+ 10
,但确切的函数可能因数据库而异。
答案 1 :(得分:0)
Oracle 11g R2架构设置:
CREATE TABLE ACTIVITY ( user_name, "date", number_of_visits ) AS
SELECT 'cat005', DATE'2015-06-03', 5 FROM DUAL
UNION ALL SELECT 'cat005', DATE'2015-06-08', 1 FROM DUAL
UNION ALL SELECT 'dog009', DATE'2015-06-01', 7 FROM DUAL
UNION ALL SELECT 'dog009', DATE'2015-06-19', 2 FROM DUAL
查询1 :
WITH changes AS (
SELECT user_name,
"date",
CASE WHEN "date" <= LAG( "date" ) OVER ( PARTITION BY user_name ORDER BY "date" ) + INTERVAL '10' DAY
THEN 0
ELSE 1 END AS change_group
FROM ACTIVITY
),
groups AS (
SELECT user_name,
"date",
SUM( change_group ) OVER ( PARTITION BY user_name ORDER BY "date" ) AS grp
FROM changes
)
SELECT user_name,
MIN( "date" ) AS activity_start,
MAX( "date" ) + INTERVAL '10' DAY AS activity_end
FROM groups
GROUP BY
USER_NAME,
GRP
<强> Results 强>:
| USER_NAME | ACTIVITY_START | ACTIVITY_END |
|-----------|------------------------|------------------------|
| dog009 | June, 19 2015 00:00:00 | June, 29 2015 00:00:00 |
| dog009 | June, 01 2015 00:00:00 | June, 11 2015 00:00:00 |
| cat005 | June, 03 2015 00:00:00 | June, 18 2015 00:00:00 |
查询2 :
WITH changes AS (
SELECT user_name,
"date",
CASE WHEN "date" <= LAG( "date" ) OVER ( PARTITION BY user_name ORDER BY "date" ) + INTERVAL '10' DAY
THEN null
ELSE "date" END AS first_date,
CASE WHEN "date" >= LEAD( "date" ) OVER ( PARTITION BY user_name ORDER BY "date" ) - INTERVAL '10' DAY
THEN null
ELSE "date" + INTERVAL '10' DAY END AS last_date
FROM ACTIVITY
)
SELECT DISTINCT
user_name,
LAST_VALUE( first_date ) IGNORE NULLS OVER ( PARTITION BY user_name ORDER BY "date" ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS activity_start,
FIRST_VALUE( last_date ) IGNORE NULLS OVER ( PARTITION BY user_name ORDER BY "date" ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING ) AS activity_end
FROM changes
<强> Results 强>:
| USER_NAME | ACTIVITY_START | ACTIVITY_END |
|-----------|------------------------|------------------------|
| cat005 | June, 03 2015 00:00:00 | June, 18 2015 00:00:00 |
| dog009 | June, 01 2015 00:00:00 | June, 11 2015 00:00:00 |
| dog009 | June, 19 2015 00:00:00 | June, 29 2015 00:00:00 |