返回两个表中都存在datetime的查询

时间:2018-01-26 08:32:58

标签: sql postgresql

我正在尝试返回一个表,该表连接广告表和网站流量表中的数据,两者都包含小时数据。但是,广告表中特定日期存在的时间戳可能不存在于网站表中。例如,时间戳" 2017-09-27 20:00:00 + 00"存在于网站流量表中但不存在于广告表中,反之亦然。

我正在使用一个选择广告表时间戳的查询,但使用左连接。完全外部联接似乎无法解决此问题,很可能是因为选择了广告时间戳而不是网站流量时间戳。

PostgreSQL中有没有办法在一列中返回两个表的时间戳?

非常感谢。

当前使用的查询如下:

SELECT
    ads.phase AS "phase",
    ads.datetime_utc AS "datetime",
    lower(array_to_string((regexp_split_to_array(ads.placement, '_'))[1:9], '_')) AS "delim_dims",
    a.name AS " name",
    ads.device AS "device",
    sum(ads.impressions) AS "impressions",
    sum(ads.clicks) AS "clicks",
    sum(ads.spend) AS "spend",
    web.sessions AS "sessions",
    web.bounces AS "bounces"
FROM
    ads_data AS ads
INNER JOIN
    lookup.names_lookup AS a ON
    ads.lookup_code = a.lookup_code
LEFT JOIN -- tested with FULL OUTER JOIN, returns same results
    web.website_traffic AS web ON
    ads.datetime_utc = web.datetime_est
    AND
    a.lookup_code = web.lookup_code
    AND
    ads.device = web.device
GROUP BY
    ads.phase,
    datetime,
    delim_dims,
    a.audience_name,
    web.sessions,
    web.bounces,
    device
HAVING
    sum(ads.spend) > 0

1 个答案:

答案 0 :(得分:0)

我对你的措辞感到困惑,因为你的问题标题要求"两个表中都存在日期时间"这表明您只需要ads_data和web.website_traffic中具有匹配日期时间的行。但是由于某种原因你想要使用LEFT JOIN或FULL OUTER JOIN,这让我觉得你想要在其中一列中有一个日期时间的行。我解释这个的方式是你想要一个具有来自任一表的日期时间的列;如果它恰好是具有匹配时间戳的行,那么很好;如果在一个或另一个表中只有一个时间戳,则返回该值。

看起来你的问题是你在lookup.names_lookup(a)和ads_data之间进行INNER JOIN。当你加入web.website_traffic时,你的一个连接条件是a.lookup_code = web.lookup_code。这实质上会将您的LEFT JOIN转换为INNER JOIN,因此您只能获取ads_data中的数据结果,而且没有任何一种情况在web.website_traffic中有一行而不是ads_data。

相反,我将从一个子查询(CTE)开始,它是ads_data与web.website_traffic的完全外部联接,以将所有不匹配的行+所有匹配的行组合在一起,然后使用lookup.names_lookup进行内部联接。

我注意到了一些问题:

  1. ads_data datetime列的名称是UTC时间,而web.website_traffic列是指EST时间。
  2. 我认为您按会话和跳出进行分组是很奇怪的,因为这些都是数字:我可能会对这些进行分析。
  3. 您参考" delim_dims"在GROUP BY中,但是因为你在SELECT子句中计算了它,你必须在GROUP BY中重复计算(最后评估SELECT子句)
  4. 这里有一些SQL尝试(忽略潜在的UTC与EST问题):

    WITH alldata AS (
        SELECT
            ads.phase,
            COALESCE(ads.datetime_utc, web.datetime_est) AS "datetime",
            COALESCE(ads.lookup_code, web.lookup_code) AS "lookup_code",
            COALESCE(ads.device, web.device) AS "device",
            ads.placement,
            ads.impressions,
            ads.clicks,
            ads.spend,
            web.sessions,
            web.bounces
        FROM
            ads_data AS ads FULL OUTER JOIN web.website_traffic AS web ON
                ads.datetime_utc = web.datetime_est AND
                ads.lookup_code = web.lookup_code AND
                ads.device = web.device
    )
    SELECT
        alldata.phase AS "phase",
        alldata.datetime AS "datetime",
        lower(array_to_string((regexp_split_to_array(alldata.placement, '_'))[1:9], '_')) AS "delim_dims",
        a.name AS "name",
        alldata.device AS "device",
        sum(alldata.impressions) AS "impressions",
        sum(alldata.clicks) AS "clicks",
        sum(alldata.spend) AS "spend",
        min(alldata.sessions) AS "sessions",
        min(alldata.bounces) AS "bounces"
    FROM
        alldata INNER JOIN lookup.names_lookup AS a ON 
            alldata.lookup_code = a.lookup_code
    GROUP BY
        alldata.phase,
        alldata.datetime,
        lower(array_to_string((regexp_split_to_array(alldata.placement, '_'))[1:9], '_')),
        a.audience_name,
        alldata.device
    HAVING
        sum(ads.spend) > 0