SPARK SQL通过选择结果

时间:2015-10-22 18:56:48

标签: sql select nested apache-spark-sql

我有一个名为table_new的表。 在第一步中,我想按id,kmstand,vacationname和vacationvalue对结果进行分组,其中每个组只有一个计数。对于此步骤,我已经创建了一个查询:

SELECT id, kmstand, vacationame, vacationvalue 
 FROM `db_1`.`table_new` 
 WHERE (vacationame='vacation1' 
    OR vacationame='vacation2' 
    OR vacationame='vacation3' 
    OR vacationame='vacation4') 
 GROUP BY id, kmstand, vacationame, vacationvalue 
 HAVING COUNT(*) = 1 ORDER BY id, kmstand DESC

结果是:

    id  kmstand vacationame vacationvalue
1   1   4000    vacation1   munich
2   1   4000    vacation1   stuttgart
3   1   5500    vacation4   koln
4   1   5500    vacation2   frankfurt
5   1   5500    vacation3   berlin
6   1   5500    vacation1   potsdam
7   2   6000    vacation2   new york
8   2   6000    vacation1   bangladesh
9   2   3000    vacation1   washington
10  2   3000    vacation3   chicago

现在,我想选择组合kmstand和vacationname现在不同的id。这意味着结果应该是:

        id  kmstand vacationame vacationvalue
1   1   5500    vacation4   koln
2   1   5500    vacation2   frankfurt
3   1   5500    vacation3   berlin
4   1   5500    vacation1   potsdam
5   2   6000    vacation2   new york
6   2   6000    vacation1   bangladesh
7   2   3000    vacation1   washington
8   2   3000    vacation3   chicago

为此,我创建了以下嵌套的sql查询:

    SELECT id, kmstand, count(*) as cnt 
FROM `db_1`.`table_new`
WHERE (SELECT id, kmstand, vacationame, vacationvalue 
     FROM `db_1`.`table_new` 
     WHERE (vacationame='vacation1' 
        OR vacationame='vacation2' 
        OR vacationame='vacation3' 
        OR vacationame='vacation4') 
     GROUP BY id, kmstand, vacationame, vacationvalue 
     HAVING COUNT(*) = 1 ORDER BY id, kmstand DESC) 
GROUP BY id, kmstand 
HAVING cnt = 1 
ORDER BY id, kmstand DESC

我也尝试过没有where子句或没有来自且没有找到解决方案。对于此SQL查询,我收到以下错误消息:org.apache.spark.sql.AnalysisException: cannot recognize input near 'SELECT' 'id' ',' in expression specification; line 3 pos 7 我很确定,语法不对。你有什么建议吗?

2 个答案:

答案 0 :(得分:1)

这是问题的答案。现在,我可以获得组合id,kmstand和vacationame不同的ID。

<a data-bind="attr: { href: $root.openOrderHref($data) }" 
   class="gradient-button action">Details</a>

答案 1 :(得分:0)

不熟悉SPARK,但您可能想要:

SELECT id, kmstand, count(*) as cnt 
FROM (SELECT id, kmstand, vacationame, vacationvalue 
     FROM `db_1`.`table_new` 
     WHERE (vacationame='vacation1' 
        OR vacationame='vacation2' 
        OR vacationame='vacation3' 
        OR vacationame='vacation4') 
     GROUP BY id, kmstand, vacationame, vacationvalue 
     HAVING COUNT(*) = 1) T
GROUP BY id, kmstand 
HAVING cnt = 1 
ORDER BY id, kmstand DESC

请注意,我在FROM子句中为表添加了别名(T)。根据您的RDBMS,这可能需要也可能不需要。

另请注意,您通常不能在子查询中使用ORDER BY。