我有一个跨两个子域镜像的网站。所以我有两个独立的分析数据集。 我有以下表格:
|------------------------------|
| table_a |
|------------------------------|
| url | mod_date |
|------------------------------|
| /foo/index.html | 2009-10-24 |
| /bar/index.php | 2010-01-04 |
| /foo/bar.html | 2009-01-04 |
|------------------------------|
|-----------------------------------------|
| table_b |
|-----------------------------------------|
| url | views | access_date |
|-----------------------------------------|
| /foo/index.html | 35000 | 2009-12-01 |
| /foo/index.html | 20000 | 2010-02-01 |
| /bar/index.php | 35000 | 2010-01-01 |
| /bar/index.php | 15000 | 2011-01-01 |
|-----------------------------------------|
|-----------------------------------------|
| table_c |
|-----------------------------------------|
| url | views | access_date |
|-----------------------------------------|
| /foo/index.html | 35000 | 2009-10-01 |
| /foo/bar.html | 10000 | 2011-05-01 |
| /bar/index.php | 35000 | 2011-08-01 |
| /bar/index.php | 15000 | 2012-04-01 |
|-----------------------------------------|
我有以下查询:
SELECT
a.url
,DATE_FORMAT(a.mod_date, '%d/%m/%Y') AS 'mod_date'
,DATE_FORMAT(MIN(b.access_date), '%d/%m/%Y') AS 'first_date'
,DATE_FORMAT(MAX(b.access_date), '%d/%m/%Y') AS 'last_date'
,SUM(ifnull(b.pages,0)) + SUM(ifnull(c.pages,0)) AS 'page_views'
,DATEDIFF(MAX(b.access_date),MIN(b.access_date)) AS 'days'
,ROUND(SUM(b.pages) / (DATEDIFF(MAX(b.access_date),MIN(b.access_date)) / 30.44)) AS 'b_mean_monthly_hits'
,ROUND(SUM(c.pages) / (DATEDIFF(MAX(c.access_date),MIN(c.access_date)) / 30.44)) AS 'a_mean_monthly_hits'
FROM
tabl_a a
LEFT JOIN
table_b b ON b.url = a.url
LEFT JOIN
table_c c ON c.url = a.url
GROUP BY a.url
HAVING ROUND(SUM(b.pages) / (DATEDIFF(MAX(b.access_date),MIN(b.access_date)) / 30.44)) < 5
AND ROUND(SUM(c.pages) / (DATEDIFF(MAX(c.access_date),MIN(c.access_date)) / 30.44)) < 5
;
我正在寻找的结果是:
|------------------------------------------------------------------------------------------|
| results |
|------------------------------------------------------------------------------------------|
| url | mod_date | first_date | last_date | page_views | avg_monthly_hits |
|------------------------------------------------------------------------------------------|
| /foo/index.html | 2009-10-24 | 2009-10-01 | 2010-02-01 | 90000 | 22273 |
| /bar/index.php | 2010-01-04 | 2010-01-01 | 2012-04-01 | 85000 | 3275 |
| /foo/bar.html | 2009-01-04 | 2011-05-01 | 2011-06-01 | 10000 | 9819 |
|------------------------------------------------------------------------------------------|
'avg_monthly_hits'是 b.views 和 c.views 的总和(作为'page_views')除以 table_b 或 table_c 中最旧和最新的 access_date 之间的天数(不知道如何获得月份)除以30.44(一个月的平均天数)。
我希望我已经完全解释了自己。 :)
答案 0 :(得分:0)
尝试此查询。有一些日期来测试它会很好
select
a.*,
b.MinDate as `FirstDate`,
b.MaxDate as `LastDate`,
(ifnull(b.PSum,0) + ifnull(c.QSum,0)) as `TotalViews`,
datediff(b.MaxDate,b.MinDate) as `Diff`,
(((ifnull(b.PSum,0) + ifnull(c.QSum,0))/datediff(b.MaxDate,b.MinDate))/30.44) as `BMonthlyHits`,
(((ifnull(b.PSum,0) + ifnull(c.QSum,0))/datediff(b.MaxDate,b.MinDate))/30.44) as `CMonthlyHits`
from table_a as a
left join (select url , min(access_date) as MinDate,max(access_date)as MaxDate,sum(pages) as PSum from table_b group by url) as b on a.url = b.url
left join (select url , min(access_date)as MinDate,max(access_date)as MaxDate, sum(pages) as QSum from table_c group by url) as c on a.url = c.url
group by a.url
HAVING BMonthlyHits < 5 and CMonthlyHits < 5
答案 1 :(得分:0)
如果table_b和table_c具有相同的结构,只需将它们联合起来
SELECT
a.url,
DATE_FORMAT(a.mod_date, '%d/%m/%Y') AS 'mod_date',
DATE_FORMAT(MIN(u.access_date), '%d/%m/%Y') AS 'first_date',
DATE_FORMAT(MAX(u.access_date), '%d/%m/%Y') AS 'last_date',
SUM(u.views) AS 'page_views',
DATEDIFF(MAX(u.access_date), MIN(u.access_date)) AS 'days',
ROUND(SUM(u.views) / (DATEDIFF(MAX(u.access_date),MIN(u.access_date)) / 30.44)) AS 'avg_monthly_hits'
FROM table_a AS a
LEFT JOIN (
(SELECT * FROM table_b)
UNION
(SELECT * FROM table_c)
) AS u USING (url)
GROUP BY a.url
HAVING avg_monthly_hits < 5
答案 2 :(得分:0)
最后,嵌套查询解决了这个问题。
SELECT DISTINCT a.url
, q.mod_date
, IF(q.b_min_date < q.c_min_date, q.b_min_date, q.c_min_date) AS 'min_date'
, IF(q.b_max_date > q.c_max_date, q.b_max_date, q.c_max_date) AS 'max_date'
, (PERIOD_DIFF(DATE_FORMAT(IF(q.b_max_date > q.c_max_date, q.b_max_date, q.c_max_date), '%Y%m'),DATE_FORMAT(IF(q.b_min_date < q.c_min_date, q.b_min_date, q.c_min_date), '%Y%m')) + 1) AS 'months'
, q.page_views
, ROUND(q.page_views / ((PERIOD_DIFF(DATE_FORMAT(IF(q.b_max_date > q.c_max_date, q.b_max_date, q.c_max_date), '%Y%m'),DATE_FORMAT(IF(q.b_min_date < q.c_min_date, q.b_min_date, q.c_min_date), '%Y%m'))) + 1)) AS 'avg_monthly_hits'
FROM table_a a
INNER JOIN
(SELECT
a.url,
a.date AS 'mod_date',
MIN(b.date) AS 'b_min_date',
MAX(b.date) AS 'b_max_date',
MIN(c.date) AS 'c_min_date',
MAX(c.date) AS 'c_max_date',
SUM(ifnull(b.pages, 0)) + SUM(ifnull(c.pages, 0)) AS 'page_views'
FROM
table_a a
LEFT JOIN
table_b b ON a.url = b.url
LEFT JOIN
table_c c ON a.url = c.url
GROUP BY a.url
) q
ON a.url = q.url
WHERE ROUND(q.page_views / ((PERIOD_DIFF(DATE_FORMAT(IF(q.b_max_date > q.c_max_date, q.b_max_date, q.c_max_date), '%Y%m'),DATE_FORMAT(IF(q.b_min_date < q.c_min_date, q.b_min_date, q.c_min_date), '%Y%m'))) + 1)) < 5
;