我有两个表:
posts (id, published_at)
posts_images (id, post_id, image_url(null or url string))
每个帖子至少具有1个posts_images记录,并且也可以包含多个。
我的目标:查询显示的是具有1张或多张图片的帖子所占的百分比,按星期(7天)细分。
这是我的查询:
SELECT floor(datediff(p.created_at, curdate()) / 7) AS weeks_ago,
date(min(p.created_at)) AS "Date Start",
date(max(p.created_at)) AS "Date End",
count(DISTINCT p.id) AS "Posts in Cohort"
count(pc.image_url) / count(p.id) AS "Post w 1 or more Images Ratio",
FROM posts p
INNER JOIN posts_images pc
ON p.id = pc.post_id
WHERE p.published_at IS NOT NULL
GROUP BY weeks_ago
ORDER BY weeks_ago DESC;
查询运行正常并输出数据,但是由于帖子具有1个或多个posts_images,因此我不确定我是否在正确执行JOIN。我担心SQL会选择第一个posts_images记录,而不是全部查看。
我这样做正确吗?
答案 0 :(得分:3)
我认为您最好采用两种聚合级别:
SELECT floor(datediff(p.created_at, curdate()) / 7) AS weeks_ago,
date(min(p.created_at)) AS "Date Start",
date(max(p.created_at)) AS "Date End",
count(*) as "Posts in Cohort",
avg(has_image) as "Post w 1 or more Images Ratio",
FROM (SELECT p.id, p.created_at,
( MAX(pi.image_url) IS NOT NULL ) as has_image
FROM posts p JOIN
posts_images pi
ON p.id = pi.post_id
WHERE p.published_at IS NOT NULL
GROUP BY p.id
) p
GROUP BY weeks_ago
ORDER BY weeks_ago DESC;
答案 1 :(得分:1)
我将从发现多个图像的情况开始:
SELECT post_id, COUNT(*) AS ct
FROM posts_images
GROUP BY post_id
HAVING ct > 1
然后,我将去posts
查找所涉及的星期:
SELECT floor(datediff(p.created_at, curdate()) / 7) AS weeks_ago
date(min(p.created_at)) AS "Date Start",
date(max(p.created_at)) AS "Date End",
count(*) AS "Posts in Cohort"
ROUND(SUM(x.ct) / count(*), 3) AS "Post w 1 or more Images Ratio",
FROM ( .. the query above .. ) AS x
JOIN posts AS p ON x.post_id = p.id
GROUP BY weeks_ago
ORDER BY weeks_ago DESC;
与您的方法相比,优点是中间临时表较小(每个帖子一行,而每个图像一行)。
潜在问题:
FLOOR
向后工作以获取“星期”的开始/结束来解决此问题。LEFT JOIN
。