复杂(对我来说)MySQL日期匹配

时间:2012-07-04 12:45:14

标签: mysql left-join datediff

我有一个跨两个子域镜像的网站。所以我有两个独立的分析数据集。 我有以下表格:

|------------------------------|
| table_a                      |
|------------------------------|
| url             | mod_date   |
|------------------------------|
| /foo/index.html | 2009-10-24 |
| /bar/index.php  | 2010-01-04 |
| /foo/bar.html   | 2009-01-04 |
|------------------------------|

|-----------------------------------------|
| table_b                                 |
|-----------------------------------------|
| url             | views | access_date   |
|-----------------------------------------|
| /foo/index.html | 35000 | 2009-12-01    |
| /foo/index.html | 20000 | 2010-02-01    |
| /bar/index.php  | 35000 | 2010-01-01    |
| /bar/index.php  | 15000 | 2011-01-01    |
|-----------------------------------------|

|-----------------------------------------|
| table_c                                 |
|-----------------------------------------|
| url             | views | access_date   |
|-----------------------------------------|
| /foo/index.html | 35000 | 2009-10-01    |
| /foo/bar.html   | 10000 | 2011-05-01    |
| /bar/index.php  | 35000 | 2011-08-01    |
| /bar/index.php  | 15000 | 2012-04-01    |
|-----------------------------------------|

我有以下查询:

SELECT 
    a.url
    ,DATE_FORMAT(a.mod_date, '%d/%m/%Y') AS 'mod_date'
    ,DATE_FORMAT(MIN(b.access_date), '%d/%m/%Y') AS 'first_date'
    ,DATE_FORMAT(MAX(b.access_date), '%d/%m/%Y') AS 'last_date'
    ,SUM(ifnull(b.pages,0)) + SUM(ifnull(c.pages,0)) AS 'page_views'    
    ,DATEDIFF(MAX(b.access_date),MIN(b.access_date)) AS 'days'
    ,ROUND(SUM(b.pages) / (DATEDIFF(MAX(b.access_date),MIN(b.access_date)) / 30.44)) AS 'b_mean_monthly_hits'
    ,ROUND(SUM(c.pages) / (DATEDIFF(MAX(c.access_date),MIN(c.access_date)) / 30.44)) AS 'a_mean_monthly_hits'
FROM
    tabl_a a
        LEFT JOIN
    table_b b ON b.url = a.url
        LEFT JOIN
    table_c c ON c.url = a.url
GROUP BY a.url
HAVING ROUND(SUM(b.pages) / (DATEDIFF(MAX(b.access_date),MIN(b.access_date)) / 30.44)) < 5
AND ROUND(SUM(c.pages) / (DATEDIFF(MAX(c.access_date),MIN(c.access_date)) / 30.44)) < 5
;

我正在寻找的结果是:

|------------------------------------------------------------------------------------------|
| results                                                                                  |
|------------------------------------------------------------------------------------------|
| url             | mod_date   | first_date | last_date  | page_views   | avg_monthly_hits |
|------------------------------------------------------------------------------------------|
| /foo/index.html | 2009-10-24 | 2009-10-01 | 2010-02-01 | 90000        | 22273            |
| /bar/index.php  | 2010-01-04 | 2010-01-01 | 2012-04-01 | 85000        | 3275             |
| /foo/bar.html   | 2009-01-04 | 2011-05-01 | 2011-06-01 | 10000        | 9819             |
|------------------------------------------------------------------------------------------|

'avg_monthly_hits' b.views c.views 的总和(作为'page_views')除以 table_b table_c 中最旧和最新的 access_date 之间的天数(不知道如何获得月份)除以30.44(一个月的平均天数)。

我希望我已经完全解释了自己。 :)

3 个答案:

答案 0 :(得分:0)

尝试此查询。有一些日期来测试它会很好

select
  a.*,
  b.MinDate as `FirstDate`,
  b.MaxDate as `LastDate`,
  (ifnull(b.PSum,0) + ifnull(c.QSum,0)) as `TotalViews`,
  datediff(b.MaxDate,b.MinDate) as `Diff`,
  (((ifnull(b.PSum,0) + ifnull(c.QSum,0))/datediff(b.MaxDate,b.MinDate))/30.44) as `BMonthlyHits`,
  (((ifnull(b.PSum,0) + ifnull(c.QSum,0))/datediff(b.MaxDate,b.MinDate))/30.44) as `CMonthlyHits`
from table_a as a
left join (select url , min(access_date) as MinDate,max(access_date)as MaxDate,sum(pages) as PSum from table_b group by url) as b on a.url = b.url
left join (select url , min(access_date)as MinDate,max(access_date)as MaxDate, sum(pages) as QSum from table_c group by url) as c on a.url = c.url
group by a.url
HAVING BMonthlyHits < 5 and CMonthlyHits < 5

答案 1 :(得分:0)

如果table_b和table_c具有相同的结构,只需将它们联合起来

SELECT
 a.url,
 DATE_FORMAT(a.mod_date, '%d/%m/%Y') AS 'mod_date',
 DATE_FORMAT(MIN(u.access_date), '%d/%m/%Y') AS 'first_date',
 DATE_FORMAT(MAX(u.access_date), '%d/%m/%Y') AS 'last_date',
 SUM(u.views) AS 'page_views',
 DATEDIFF(MAX(u.access_date), MIN(u.access_date)) AS 'days',
 ROUND(SUM(u.views) / (DATEDIFF(MAX(u.access_date),MIN(u.access_date)) / 30.44)) AS 'avg_monthly_hits'
FROM table_a AS a 
LEFT JOIN (
   (SELECT * FROM table_b) 
   UNION 
   (SELECT * FROM table_c)
) AS u USING (url)
GROUP BY a.url
HAVING avg_monthly_hits < 5

答案 2 :(得分:0)

最后,嵌套查询解决了这个问题。

SELECT DISTINCT a.url
, q.mod_date
, IF(q.b_min_date < q.c_min_date, q.b_min_date, q.c_min_date) AS 'min_date'
, IF(q.b_max_date > q.c_max_date, q.b_max_date, q.c_max_date) AS 'max_date'
, (PERIOD_DIFF(DATE_FORMAT(IF(q.b_max_date > q.c_max_date, q.b_max_date, q.c_max_date), '%Y%m'),DATE_FORMAT(IF(q.b_min_date < q.c_min_date, q.b_min_date, q.c_min_date), '%Y%m')) + 1) AS 'months'
, q.page_views
, ROUND(q.page_views / ((PERIOD_DIFF(DATE_FORMAT(IF(q.b_max_date > q.c_max_date, q.b_max_date, q.c_max_date), '%Y%m'),DATE_FORMAT(IF(q.b_min_date < q.c_min_date, q.b_min_date, q.c_min_date), '%Y%m'))) + 1)) AS 'avg_monthly_hits'
FROM table_a a
INNER JOIN
    (SELECT 
            a.url,
                a.date AS 'mod_date',
                MIN(b.date) AS 'b_min_date',
                MAX(b.date) AS 'b_max_date',
                MIN(c.date) AS 'c_min_date',
                MAX(c.date) AS 'c_max_date',
                SUM(ifnull(b.pages, 0)) + SUM(ifnull(c.pages, 0)) AS 'page_views'
        FROM
            table_a a
                LEFT JOIN
            table_b b ON a.url = b.url
                LEFT JOIN
            table_c c ON a.url = c.url
        GROUP BY a.url
) q
ON a.url = q.url
WHERE ROUND(q.page_views / ((PERIOD_DIFF(DATE_FORMAT(IF(q.b_max_date > q.c_max_date, q.b_max_date, q.c_max_date), '%Y%m'),DATE_FORMAT(IF(q.b_min_date < q.c_min_date, q.b_min_date, q.c_min_date), '%Y%m'))) + 1)) < 5
;