我有一个大型数据集,其中一些是重复记录,可以通过两个字段中的欺骗来识别。
要查找这些记录,以下查询有效:
SELECT * FROM supplierstuffs
GROUP BY "Supplier Code", "Cost ex Tax"
HAVING count("Description") > 1
基本上我想要做的是将“描述”的所有值组合在一起形成一行,然后用单行替换所有重复的行。
到目前为止,这是我半破的查询,它是笨拙和可怕的。我的主要目标是让这个工作 - 但是如果我在sql中学习一些新的技巧并不是一件坏事。
UPDATE supplierstuffs SET "Description" =
(SELECT array_to_string(array_accum("Description"), ', ') FROM supplierstuffs
GROUP BY "Supplier Code", "Cost ex Tax"
HAVING count("Description") > 1)
WHERE .....
这是我已经得到的。我应该阅读什么才能进一步了解?我已经阅读了几本关于这个主题的书籍和很多网页。但是在这种情况下,我认为我的问题不仅仅是缺少SQL(好吧,这不是我的唯一的问题),而是更多地以错误的方式解决问题。
编辑1:
'Name'; 'Supplier Code'; 'Desciption';
"7CPS PODIUM S/SLV CRICKET POLO";"7CPS";"04 -14, S - 3XL"
"7CP PODIUM CRICKET PANT ";"7CP";"08 -14, S - 2XL"
"7CPT PODIUM 3/4 SLV CRICKET POLO";"7CPT";"04 -14, S - 3XL"
"7CPL PODIUM L/SLV CRICKET POLO";"7CPL";"04 -14, S - 3XL"
"T444MS Cool dry breathable sporty T-shirts";"T444MS";"XS - 2XL, XS - 2XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL, 8-16"
^^是我想要从vv
创建的"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"S - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"8-16"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T232RG Raglan Sleeve Tee";"T232RG";"XS - 3XL"
"T444MS Cool dry breathable sporty T-shirts";"T444MS";"XS - 2XL"
"T444MS Cool dry breathable sporty T-shirts";"T444MS";"XS - 2XL"
"T444MS Cool dry breathable sporty T-shirts";"T444MS";"XS - 2XL"
"T444MS Cool dry breathable sporty T-shirts";"T444MS";"XS - 2XL"
"7CP PODIUM CRICKET PANT ";"7CP";"08 -14"
"7CP PODIUM CRICKET PANT ";"7CP";"S - 2XL"
"7CPL PODIUM L/SLV CRICKET POLO";"7CPL";"04 -14"
"7CPL PODIUM L/SLV CRICKET POLO";"7CPL";"S - 3XL"
"7CPS PODIUM S/SLV CRICKET POLO";"7CPS";"04 -14"
"7CPS PODIUM S/SLV CRICKET POLO";"7CPS";"S - 3XL"
"7CPT PODIUM 3/4 SLV CRICKET POLO";"7CPT";"04 -14"
"7CPT PODIUM 3/4 SLV CRICKET POLO";"7CPT";"S - 3XL"
^^注意到没有多条描述行的行需要保持不变。
我到目前为止在新表中创建了新记录:
INSERT INTO tmptable
SELECT "Name" , "Supplier Code", array_to_string(array_accum("Description"), ', ')
FROM supplierstuffs
GROUP BY "Name", "Supplier Code", "Description"
HAVING count("Description") > 1
所以剩下的就是删除cat命令捕获的记录。看来我不能DELETE FROM
有条款吗?我认为DELETE FROM table WHERE oid IN (SELECT OID's using having clause)
会起作用吗?
编辑2:
SELECT array_accum(oid)
FROM supplierstuffs
GROUP BY "Name", "Supplier Code", "Colour", "Cost ex Tax"
HAVING count("Description") > 1
返回一些包含2个oid的数组,所有这些都需要被去除。我觉得我很亲密,但到目前为止。 提前致谢
答案 0 :(得分:2)
以下方法可行
答案 1 :(得分:0)
所以你现在拥有的是这样......
DESCRIPTION SUPPLIER_CODE COST_EX_TAX
Widget X23 42.00
Brass gadget X23 42.00
Flange X42 23.00
Flange, steel X42 23.00
......你想要的是......
DESCRIPTION SUPPLIER_CODE COST_EX_TAX
Brass gadget, Widget X23 42.00
Flange, Flange, steel X42 23.00
这似乎仍然不是正确的做法。那个连接的描述对我来说似乎不对。但是,您比我更了解您的数据和客户的要求。