创建连接Shark Hive中两个现有表的表

时间:2014-03-20 06:39:32

标签: hadoop hive hiveql apache-spark shark-sql

我有两个表oldTablenewTable,内容为:

oldTable

  key    value    volume
  ======================
  1      abc      10000
  2      def      5000

newTable

  key    value    volume
  ======================
  1      abc      2000
  2      def      3000
  3      xyz      7000

我想创建一个新表,从表中总结volume s。即,新表应包含以下内容:

joined_table

  key    value    volume
  ======================
  1      abc      12000
  2      def      8000
  3      xyz      7000

我尝试了以下陈述但没有结果:

CREATE TABLE joined_table AS
SELECT key, value, volume
FROM (
    SELECT IF(oldTable.key != NULL, oldTable.key, newTable.key) AS key,
        IF(oldTable.value != NULL, oldTable.value, newTable.value) AS value,
        IF(oldTable.volume AND newTable.volume, oldTable.volume + newTable.volume,
    IF(oldTable.volume != NULL, oldTable.volume, newTable.volume)) AS volume
    FROM(
        SELECT oldTable.key, oldTable.value, oldTable.volume, newTable.key, newTable.value, newTable.volume
        FROM newTable FULL OUTER JOIN oldTable ON newTable.key = oldTable.key
    )alias
)anotherAlias;

但这会让我误以为Query returned non-zero code: 10, cause: FAILED: Error in semantic analysis: Ambiguous column reference key

我尝试在上面的查询中更改joined_table中的列名,但它给了我同样的错误。有关如何实现这一目标的任何帮助吗?

此外,有什么方法可以将结果覆盖到现有表格,比如oldTable而不是创建这个新表格?

2 个答案:

答案 0 :(得分:0)

您在查询中使用的单词key是保留关键字。这可能是解析器抛出歧义错误的原因。您可以使用反向标记来避免解析器将其作为保留文字读取。

CREATE TABLE joined_table AS
SELECT `key`, value, volume
FROM (
SELECT IF(oldTable.`key` != NULL, oldTable.`key`, newTable.`key`) AS `key`,
    IF(oldTable.value != NULL, oldTable.value, newTable.value) AS value,
    IF(oldTable.volume AND newTable.volume, oldTable.volume + newTable.volume,
IF(oldTable.volume != NULL, oldTable.volume, newTable.volume)) AS volume
FROM(
    SELECT oldTable.`key`, oldTable.value, oldTable.volume, newTable.`key`, newTable.value, newTable,volume
    FROM newTable FULL OUTER JOIN oldTable ON newTable.`key` = oldTable.`key`;
)alias
)anotherAlias;

答案 1 :(得分:0)

确定。我设法使用以下内容完成了这项工作:

CREATE TABLE joined_table AS SELECT 
IF (newTable.key IS NULL, oldTable.key, newTable.key) as key, 
IF (newTable.value IS NULL, oldTable.value, newTable.value) as value, 
IF(newTable.volume IS NULL, oldTable.volume, 
   IF(oldTable.volume IS NULL, newTable.volume, oldTable.volume + newTable.volume)) as volume 
FROM newTable FULL OUTER JOIN oldTable ON newTable.key = oldTable.key;

我仍然需要弄清楚如何在不创建新表的情况下更新现有表。

<强>更新

INSERT OVERWRITE TABLE oldTable SELECT ...对现有表进行更新。