Question

The ACID properties in Hive allow to delete rows from a table using the following syntax :

DELETE FROM table 
WHERE id IN (SELECT id FROM raw_table)

But what's the best solution to delete rows when the primary_key is composed of several columns ?

I have tried the following with an EXISTS :

DELETE FROM table 
WHERE EXISTS (SELECT id1, id2 FROM raw_table 
              WHERE raw_table.id1 = table.id1 AND raw_table.id2 = table.id2)

Or the following (concatenating all the columns, not sure if this is valid) :

DELETE FROM table 
WHERE CONCAT(id1, id2) IN (SELECT CONCAT(id1, id2) FROM raw_table)

Do you have any advice on what is the best solution ?

Answer 1

使用exists的解决方案是有效的。此外，您的解决方案连接值是有效的，但根据您可能发现的值，您可能正在删除您不想要的数据，例如

try {
  if (jQueryUI) {
    jQueryUI();

    if (PurchaseRFQ) {
      PurchaseRFQ(); // this object contains my js script
    }
  }
}
catch (Exception) {
  $.getScript("../Scripts/jQueryUi.js");
  $.getScript("../Scripts/PurchaseOrderScript/RFQ.js", function() {
  PurchaseRFQ();
})
}

您将删除带有011的行，但它也会与

匹配

id1: 01
id2: 1

这是不期望的。我建议在ID之间添加分隔符。

id1: 0
id2: 11

两个解决方案应该只使用mapper和reduce阶段执行1个Job，因此执行计划和性能应该几乎相同

问候！

Hive delete row with composite primary key

1 个答案: