The ACID properties in Hive allow to delete rows from a table using the following syntax :
DELETE FROM table
WHERE id IN (SELECT id FROM raw_table)
But what's the best solution to delete rows when the primary_key is composed of several columns ?
I have tried the following with an EXISTS :
DELETE FROM table
WHERE EXISTS (SELECT id1, id2 FROM raw_table
WHERE raw_table.id1 = table.id1 AND raw_table.id2 = table.id2)
Or the following (concatenating all the columns, not sure if this is valid) :
DELETE FROM table
WHERE CONCAT(id1, id2) IN (SELECT CONCAT(id1, id2) FROM raw_table)
Do you have any advice on what is the best solution ?
答案 0 :(得分:0)
使用exists的解决方案是有效的。此外,您的解决方案连接值是有效的,但根据您可能发现的值,您可能正在删除您不想要的数据,例如
try {
if (jQueryUI) {
jQueryUI();
if (PurchaseRFQ) {
PurchaseRFQ(); // this object contains my js script
}
}
}
catch (Exception) {
$.getScript("../Scripts/jQueryUi.js");
$.getScript("../Scripts/PurchaseOrderScript/RFQ.js", function() {
PurchaseRFQ();
})
}
您将删除带有011的行,但它也会与
匹配id1: 01
id2: 1
这是不期望的。我建议在ID之间添加分隔符。
id1: 0
id2: 11
两个解决方案应该只使用mapper和reduce阶段执行1个Job,因此执行计划和性能应该几乎相同
问候!