Hive delete row with composite primary key

时间:2017-08-04 13:02:37

标签: sql hive sql-delete acid

The ACID properties in Hive allow to delete rows from a table using the following syntax :

DELETE FROM table 
WHERE id IN (SELECT id FROM raw_table)

But what's the best solution to delete rows when the primary_key is composed of several columns ?

I have tried the following with an EXISTS :

DELETE FROM table 
WHERE EXISTS (SELECT id1, id2 FROM raw_table 
              WHERE raw_table.id1 = table.id1 AND raw_table.id2 = table.id2) 

Or the following (concatenating all the columns, not sure if this is valid) :

DELETE FROM table 
WHERE CONCAT(id1, id2) IN (SELECT CONCAT(id1, id2) FROM raw_table)

Do you have any advice on what is the best solution ?

1 个答案:

答案 0 :(得分:0)

使用exists的解决方案是有效的。此外,您的解决方案连接值是有效的,但根据您可能发现的值,您可能正在删除您不想要的数据,例如

try {
  if (jQueryUI) {
    jQueryUI();

    if (PurchaseRFQ) {
      PurchaseRFQ(); // this object contains my js script
    }
  }
}
catch (Exception) {
  $.getScript("../Scripts/jQueryUi.js");
  $.getScript("../Scripts/PurchaseOrderScript/RFQ.js", function() {
  PurchaseRFQ();
})
}

您将删除带有011的行,但它也会与

匹配
id1: 01
id2: 1

这是不期望的。我建议在ID之间添加分隔符。

id1: 0
id2: 11

两个解决方案应该只使用mapper和reduce阶段执行1个Job,因此执行计划和性能应该几乎相同

问候!