将JOIN与DISTINCT一起使用并确定一个表的优先级

时间:2013-12-18 13:36:21

标签: mysql

我正在尝试合并来自2个表格的数据 这两个表都包含来自同一传感器的数据(比方说,传感器测量二氧化碳,每10分钟输入一次)。

第一个表包含经过验证的数据。我们称之为station1_validated。第二个表包含原始数据。我们称之为station1_nrt

虽然原始数据表包含实时数据,但经过验证的表仅包含至少1个月的数据点。 (需要一些时间来验证这些数据并在之后手动控制它,这种情况每个月只发生一次)。

我现在要做的是是将这两个表的数据组合在一起以显示网站上的实时数据。但是,当验证数据可用时,它应该优先考虑原始数据点上的数据点。

相关专栏是:

  • timed [bigint(20)]:包含日期时间作为unix时间戳,以毫秒为单位,从1.1.1970开始
  • CO2 [double]:包含测量的CO2浓度(ppm)(百万分率)

我写了这个基本的SQL:

SELECT 
    *
FROM
    (SELECT 
        timed, CO2, '2' tab
    FROM
        station1_nrt
    WHERE
        TIMED >= 1386932400000
            AND TIMED <= 1386939600000
            AND TIMED NOT IN (SELECT 
                timed
            FROM
                station1_nrt
            WHERE
                CO2 IS NOT NULL
                    AND TIMED >= 1386932400000
                    AND TIMED <= 1386939600000) UNION SELECT 
        timed, CO2, '1' tab
    FROM
        station1_validated
    WHERE
        CO2 IS NOT NULL
            AND TIMED >= 1386932400000
            AND TIMED <= 1386939600000) a
ORDER BY timed

这不能正常工作,因为它只选择两个表都有条目的数据点。 但是我想用JOIN来做这件事,因为它会更快。但是,我不知道如何使用DISTINCT(或类似的东西)来加入表以确定优先级。有人可以帮我解决这个问题(或解释一下吗?)

4 个答案:

答案 0 :(得分:3)

您尚未提及station1_validated中是否存在station1_nrt中不存在的记录,因此我使用FULL JOIN。如果station1_validated中存在来自station1_nrt的所有行,则可以使用LEFT JOIN代替。

像这样的东西

SELECT IFNULL(n.timed,v.timed) as timed,
       CASE WHEN v.timed IS NOT NULL THEN v.CO2 ELSE n.CO2 END as CO2,
       CASE WHEN v.timed IS NOT NULL THEN '1' ELSE '2' END as tab

FROM station1_nrt as n
FULL JOIN station1_validated as v ON n.timed=v.timed AND v.CO2 IS NOT NULL
    WHERE
        ( n.TIMED between 1386932400000 AND 1386939600000
          or 
          v.TIMED between 1386932400000 AND 1386939600000
        )
        AND 
        (n.CO2 IS NOT NULL OR v.CO2 IS NOT NULL)

答案 1 :(得分:1)

您可以加入,然后在字段中使用IF来选择已验证的值(如果存在)。类似的东西:

SELECT
IFNULL(s1val.timed,s1.timed) AS timed,
IFNULL(s1val.C02,s1.C02) AS C02,
2 AS 2,
IFNULL(s1val.tab,s1.tab) AS tab,
FROM 
station1_nrt s1
LEFT JOIN station1_validated s1val ON (s1.TIMED = s1val.TIMED)
WHERE
-- Any necessary where clauses

答案 2 :(得分:1)

MySQL有IF可能对你有用。但是,您必须选择特定列,但您可以通过编程方式构建查询。

SELECT
    IF(DATE_SUB(NOW(), INTERVAL 1 MONTH) < FROM_UNIXTIME(nrt.TIMED),
        val.value,
        nrt.value
    ) AS value
    -- Similar for other values
FROM
    station1_nrt AS nrt
    JOIN station1_validated AS val USING(id)
ORDER BY TIMED

请注意,USING(id)是占位符。据推测,有一些索引列可以加入这两个表。

答案 3 :(得分:0)

@Jim,@valex,@ ExplosionPills 我设法编写了一个模拟FULL OUTER JOIN的SQL选择(因为MySQL中没有FULL JOIN),并返回验证数据的值(如果存在)。如果没有可用的验证数据,它将返回原始值

所以这就是我现在使用的SQL:

SET @StartTime  = 1356998400000;
SET @EndTime    = 1386546000000;

SELECT
    timed,
    IFNULL (mergedData.validatedValue, mergedData.rawValue) as value
FROM
((SELECT 
    from_unixtime(timed / 1000) as timed,
    rawData.NOX as rawValue,
    validatedData.NOX as validatedValue
FROM
    nabelnrt_bas as rawData
    LEFT JOIN nabelvalidated_bas as validatedData using(timed)
WHERE 
    (rawData.timed > @StartTime
    AND rawData.timed < @EndTime)
    OR (validatedData.timed > @StartTime
    AND validatedData.timed < @EndTime)

) UNION (
SELECT 
    from_unixtime(timed / 1000) as timed,
    rawData.NOX as rawValue,
    validatedData.NOX as validatedValue
FROM
    nabelnrt_bas as rawData
    RIGHT JOIN nabelvalidated_bas as validatedData using(timed)
WHERE 
    (rawData.timed > @StartTime
    AND rawData.timed < @EndTime)
    OR (validatedData.timed > @StartTime
    AND validatedData.timed < @EndTime)
)
ORDER BY timed DESC) as mergedData