我需要分析一些博客,并确定用户是否曾访问过一次,休息一年,再次访问过。我希望使用符合上述条件的VisitId为每一行(Y / N)添加一个标记。
我将如何创建这个sql?
以下是我所拥有的字段,我认为需要使用(通过分析每次访问的第一页的时间戳):
select VisitID, UserID, TimeStamp from page_view_t where pageNum = 1;
谢谢 - 非常感谢任何帮助。
答案 0 :(得分:5)
您可以对每个用户的行进行排名,然后将排名的行集加入到自身中以比较相邻的行:
;
WITH ranked AS (
SELECT
*,
rnk = ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY TimeStamp)
FROM page_view_t
),
flagged AS (
SELECT
*,
IsReturnVisit = CASE
WHEN EXISTS (
SELECT *
FROM ranked
WHERE UserID = r.UserID
AND rnk = r.rnk - 1
AND TimeStamp <= DATEADD(YEAR, -1, r.TimeStamp)
)
THEN 'Y'
ELSE 'N'
END
FROM ranked r
)
SELECT
VisitID,
UserID,
TimeStamp,
IsReturnVisit
FROM flagged
注意:上述标志仅返回访问次数。
<强>更新强>
要将首次访问标记为返回访问次数,可以按如下方式修改flagged
CTE:
…
SELECT
*,
IsFirstOrReturnVisit = CASE
WHEN p.UserID IS NULL OR r.TimeStamp >= DATEADD(YEAR, 1, p.TimeStamp)
THEN 'Y'
ELSE 'N'
END
FROM ranked r
LEFT JOIN ranked p ON r.UserID = p.UserID AND r.rnk = p.rnk + 1
…
可能有用的参考文献:
答案 1 :(得分:1)
另一个人的速度更快,但是因为我花了很多时间做这件事而且这是一个完全不同的方法,我不妨发布它:D。
SELECT pv2.VisitID,
pv2.UserID,
pv2.TimeStamp,
CASE WHEN pv1.VisitID IS NOT NULL
AND pv3.VisitID IS NULL
THEN 'YES' ELSE 'NO' END AS IsReturnVisit
FROM page_view_t pv2
LEFT JOIN page_view_t pv1 ON pv1.UserID = pv2.UserID
AND pv1.VisitID <> pv2.VisitID
AND (pv1.TimeStamp <= DATEADD(YEAR, -1, pv2.TimeStamp)
OR pv2.TimeStamp <= DATEADD(YEAR, -1, pv1.TimeStamp))
AND pv1.pageNum = 1
LEFT JOIN page_view_t pv3 ON pv1.UserID = pv3.UserID
AND (pv3.TimeStamp BETWEEN pv1.TimeStamp AND pv2.TimeStamp
OR pv3.TimeStamp BETWEEN pv2.TimeStamp AND pv1.TimeStamp)
AND pv3.pageNum = 1
WHERE pv2.pageNum = 1
答案 2 :(得分:1)
假设page_view_t表存储了用户每次访问的UserID和TimeStamp详细信息,以下查询将返回已访问过的用户在两次连续访问之间休息至少一年(365天)。
select t1.UserID
from page_view_t t1
where (
select datediff(day, max(t2.[TimeStamp]), t1.[TimeStamp])
from page_view_t t2
where t2.UserID = t1.UserID and t2.[TimeStamp] < t1.[TimeStamp]
group by t2.UserID
) >= 365