我的目标是删除多个连接表(MySQL)中的重复行并更新或创建新的主键/外键以保持关系。
我向我的MySQL数据库导入了大量xml文件(> 10.000文件,~100 GB),导致50多个表。在导入文件时,我创建了主键和外键来维护关系。这是很多工作,所以我不想再次导入它,这就是我想在我的数据库中进行更改的原因。
xml文件如下所示:
<PRODUCT>
<PRODUCT_ID> P1 </PRODUCT_ID>
<PRODUCT_NAME> Super Product </PRODUCT_NAME>
<PART_LIST>
<PART>
<PART_ID> 1234 </PART_ID>
<PART_NAME> foobar </PART_NAME>
<RESPONSIBLE>
<USER_ID> abcd </USER_ID>
<USER_NAME> Frank </USER_NAME>
<DEPARTMENT>
<DEPT_ID> D23E </ORGA_ID>
<DEPT_NAME> Sales </ORGA_NAME>
</DEPARTMENT>
</RESPONSIBLE>
</PART>
<PART>
<PART_ID> 3456 </PART_ID>
<PART_NAME> Case </PART_NAME>
<RESPONSIBLE>
<USER_ID> cdef </USER_ID>
<USER_NAME> Will </USER_NAME>
<DEPARTMENT>
<DEPT_ID> D23E </ORGA_ID>
<DEPT_NAME> Sales </ORGA_NAME>
</DEPARTMENT>
</RESPONSIBLE>
</PART>
<PART>
<PART_ID> 6789 </PART_ID>
<PART_NAME> Button </PART_NAME>
<RESPONSIBLE>
<USER_ID> efgh </USER_ID>
<USER_NAME> John </USER_NAME>
<DEPARTMENT>
<DEPT_ID> D99T </ORGA_ID>
<DEPT_NAME> Production </ORGA_NAME>
</DEPARTMENT>
</RESPONSIBLE>
</PART>
</PART_LIST>
</PRODUCT>
<PRODUCT>
<PRODUCT_ID> P2 </PRODUCT_ID>
<PRODUCT_NAME> Good Product </PRODUCT_NAME>
<PART_LIST>
<PART>
<PART_ID> 1234 </PART_ID>
<PART_NAME> foobar </PART_NAME>
<RESPONSIBLE>
<USER_ID> abcd </USER_ID>
<USER_NAME> Frank </USER_NAME>
<DEPARTMENT>
<DEPT_ID> D23E </ORGA_ID>
<DEPT_NAME> Sales </ORGA_NAME>
</DEPARTMENT>
</RESPONSIBLE>
</PART>
<PART>
<PART_ID> 3456 </PART_ID>
<PART_NAME> Case </PART_NAME>
<RESPONSIBLE>
<USER_ID> cdef </USER_ID>
<USER_NAME> Will </USER_NAME>
<DEPARTMENT>
<DEPT_ID> D23E </ORGA_ID>
<DEPT_NAME> Sales </ORGA_NAME>
</DEPARTMENT>
</RESPONSIBLE>
</PART>
<PART>
<PART_ID> 5432 </PART_ID>
<PART_NAME> Switch </PART_NAME>
<RESPONSIBLE>
<USER_ID> abcd </USER_ID>
<USER_NAME> Frank </USER_NAME>
<DEPARTMENT>
<DEPT_ID> D23E </ORGA_ID>
<DEPT_NAME> Sales </ORGA_NAME>
</DEPARTMENT>
</RESPONSIBLE>
</PART>
</PART_LIST>
</PRODUCT>
我的数据库中的结果如下所示:
╔══════════╦════════════╦═══════════════╗
║ PRODUCT ║ ║ ║
╠══════════╬════════════╬═══════════════╣
║ PROD_KEY ║ PRODUCT_ID ║ PRODUCT_NAME ║
║ 1 ║ P1 ║ Super Product ║
║ 2 ║ P2 ║ Good Product ║
╚══════════╩════════════╩═══════════════╝
╔═══════════╦══════════╗
║ PART_LIST ║ ║
╠═══════════╬══════════╣
║ LIST_KEY ║ PROD_KEY ║
║ 1 ║ 1 ║
║ 2 ║ 2 ║
╚═══════════╩══════════╝
╔══════════╦══════════╦═════════╦═══════════╗
║ PART ║ ║ ║ ║
╠══════════╬══════════╬═════════╬═══════════╣
║ PART_KEY ║ LIST_KEY ║ PART_ID ║ PART_NAME ║
║ 1 ║ 1 ║ 1234 ║ foobar ║
║ 2 ║ 1 ║ 3456 ║ Case ║
║ 3 ║ 1 ║ 6789 ║ Button ║
║ 4 ║ 2 ║ 1234 ║ foobar ║
║ 5 ║ 2 ║ 3456 ║ Case ║
║ 6 ║ 2 ║ 5432 ║ Switch ║
╚══════════╩══════════╩═════════╩═══════════╝
╔═════════════╦══════════╦═════════╦═══════════╗
║ RESPONSIBLE ║ ║ ║ ║
╠═════════════╬══════════╬═════════╬═══════════╣
║ RESP_KEY ║ PART_KEY ║ USER_ID ║ USER_NAME ║
║ 1 ║ 1 ║ abcd ║ Frank ║
║ 2 ║ 2 ║ cdef ║ Will ║
║ 3 ║ 3 ║ efgh ║ John ║
║ 4 ║ 4 ║ abcd ║ Frank ║
║ 5 ║ 5 ║ cdef ║ Will ║
║ 6 ║ 6 ║ abcd ║ Frank ║
╚═════════════╩══════════╩═════════╩═══════════╝
╔════════════╦══════════╦═════════╦════════════╗
║ DEPARTMENT ║ ║ ║ ║
╠════════════╬══════════╬═════════╬════════════╣
║ DEPT_KEY ║ RESP_KEY ║ DEPT_ID ║ DEPT_NAME ║
║ 1 ║ 1 ║ D23E ║ Sales ║
║ 2 ║ 2 ║ D23E ║ Sales ║
║ 3 ║ 3 ║ D99T ║ Production ║
║ 4 ║ 4 ║ D23E ║ Sales ║
║ 5 ║ 5 ║ D23E ║ Sales ║
║ 6 ║ 6 ║ D23E ║ Sales ║
╚════════════╩══════════╩═════════╩════════════╝
由于xml文件的层次结构,我生成了大量重复的条目(例如,您看到用户的3倍&#34; Frank&#34;或销售部门的5倍)。我的猜测是我的数据中有75%是重复数据。这就是为什么我真的想要将数据转换回真正的关系数据库并删除所有重复的行,同时保持关系。
有没有人知道该怎么做?
我感谢任何帮助, 乌利