MySQL - 删除重复行并更新或创建新的主键/外键

时间:2016-02-05 08:42:21

标签: mysql xml foreign-keys relational-database primary-key

我的目标是删除多个连接表(MySQL)中的重复行并更新或创建新的主键/外键以保持关系。

我向我的MySQL数据库导入了大量xml文件(> 10.000文件,~100 GB),导致50多个表。在导入文件时,我创建了主键和外键来维护关系。这是很多工作,所以我不想再次导入它,这就是我想在我的数据库中进行更改的原因。

xml文件如下所示:

<PRODUCT>
    <PRODUCT_ID> P1 </PRODUCT_ID>
    <PRODUCT_NAME> Super Product </PRODUCT_NAME>
        <PART_LIST>
            <PART>
                <PART_ID> 1234 </PART_ID>
                <PART_NAME> foobar </PART_NAME>
                    <RESPONSIBLE>
                        <USER_ID> abcd </USER_ID>
                        <USER_NAME> Frank </USER_NAME>
                            <DEPARTMENT>
                                <DEPT_ID> D23E </ORGA_ID>
                                <DEPT_NAME> Sales </ORGA_NAME>
                            </DEPARTMENT>
                    </RESPONSIBLE>
            </PART>
            <PART>
                <PART_ID> 3456 </PART_ID>
                <PART_NAME> Case </PART_NAME>
                    <RESPONSIBLE>
                        <USER_ID> cdef </USER_ID>
                        <USER_NAME> Will </USER_NAME>
                            <DEPARTMENT>
                                <DEPT_ID> D23E </ORGA_ID>
                                <DEPT_NAME> Sales </ORGA_NAME>
                            </DEPARTMENT>
                    </RESPONSIBLE>
            </PART>
            <PART>
                <PART_ID> 6789 </PART_ID>
                <PART_NAME> Button </PART_NAME>
                    <RESPONSIBLE>
                        <USER_ID> efgh </USER_ID>
                        <USER_NAME> John </USER_NAME>
                            <DEPARTMENT>
                                <DEPT_ID> D99T </ORGA_ID>
                                <DEPT_NAME> Production </ORGA_NAME>
                            </DEPARTMENT>
                    </RESPONSIBLE>
            </PART>
        </PART_LIST>
</PRODUCT>
<PRODUCT>
    <PRODUCT_ID> P2 </PRODUCT_ID>
    <PRODUCT_NAME> Good Product </PRODUCT_NAME>
        <PART_LIST>
            <PART>
                <PART_ID> 1234 </PART_ID>
                <PART_NAME> foobar </PART_NAME>
                    <RESPONSIBLE>
                        <USER_ID> abcd </USER_ID>
                        <USER_NAME> Frank </USER_NAME>
                            <DEPARTMENT>
                                <DEPT_ID> D23E </ORGA_ID>
                                <DEPT_NAME> Sales </ORGA_NAME>
                            </DEPARTMENT>
                    </RESPONSIBLE>
            </PART>
            <PART>
                <PART_ID> 3456 </PART_ID>
                <PART_NAME> Case </PART_NAME>
                    <RESPONSIBLE>
                        <USER_ID> cdef </USER_ID>
                        <USER_NAME> Will </USER_NAME>
                            <DEPARTMENT>
                                <DEPT_ID> D23E </ORGA_ID>
                                <DEPT_NAME> Sales </ORGA_NAME>
                            </DEPARTMENT>
                    </RESPONSIBLE>
            </PART>
            <PART>
                <PART_ID> 5432 </PART_ID>
                <PART_NAME> Switch </PART_NAME>
                    <RESPONSIBLE>
                        <USER_ID> abcd </USER_ID>
                        <USER_NAME> Frank </USER_NAME>
                            <DEPARTMENT>
                                <DEPT_ID> D23E </ORGA_ID>
                                <DEPT_NAME> Sales </ORGA_NAME>
                            </DEPARTMENT>
                    </RESPONSIBLE>
            </PART>
        </PART_LIST>
</PRODUCT>

我的数据库中的结果如下所示:

╔══════════╦════════════╦═══════════════╗
║ PRODUCT  ║            ║               ║
╠══════════╬════════════╬═══════════════╣
║ PROD_KEY ║ PRODUCT_ID ║ PRODUCT_NAME  ║
║ 1        ║ P1         ║ Super Product ║
║ 2        ║ P2         ║ Good Product  ║
╚══════════╩════════════╩═══════════════╝
╔═══════════╦══════════╗
║ PART_LIST ║          ║
╠═══════════╬══════════╣
║ LIST_KEY  ║ PROD_KEY ║
║ 1         ║ 1        ║
║ 2         ║ 2        ║
╚═══════════╩══════════╝
╔══════════╦══════════╦═════════╦═══════════╗
║   PART   ║          ║         ║           ║
╠══════════╬══════════╬═════════╬═══════════╣
║ PART_KEY ║ LIST_KEY ║ PART_ID ║ PART_NAME ║
║ 1        ║ 1        ║ 1234    ║ foobar    ║
║ 2        ║ 1        ║ 3456    ║ Case      ║
║ 3        ║ 1        ║ 6789    ║ Button    ║
║ 4        ║ 2        ║ 1234    ║ foobar    ║
║ 5        ║ 2        ║ 3456    ║ Case      ║
║ 6        ║ 2        ║ 5432    ║ Switch    ║
╚══════════╩══════════╩═════════╩═══════════╝
╔═════════════╦══════════╦═════════╦═══════════╗
║ RESPONSIBLE ║          ║         ║           ║
╠═════════════╬══════════╬═════════╬═══════════╣
║ RESP_KEY    ║ PART_KEY ║ USER_ID ║ USER_NAME ║
║ 1           ║ 1        ║ abcd    ║ Frank     ║
║ 2           ║ 2        ║ cdef    ║ Will      ║
║ 3           ║ 3        ║ efgh    ║ John      ║
║ 4           ║ 4        ║ abcd    ║ Frank     ║
║ 5           ║ 5        ║ cdef    ║ Will      ║
║ 6           ║ 6        ║ abcd    ║ Frank     ║
╚═════════════╩══════════╩═════════╩═══════════╝
╔════════════╦══════════╦═════════╦════════════╗
║ DEPARTMENT ║          ║         ║            ║
╠════════════╬══════════╬═════════╬════════════╣
║ DEPT_KEY   ║ RESP_KEY ║ DEPT_ID ║ DEPT_NAME  ║
║ 1          ║ 1        ║ D23E    ║ Sales      ║
║ 2          ║ 2        ║ D23E    ║ Sales      ║
║ 3          ║ 3        ║ D99T    ║ Production ║
║ 4          ║ 4        ║ D23E    ║ Sales      ║
║ 5          ║ 5        ║ D23E    ║ Sales      ║
║ 6          ║ 6        ║ D23E    ║ Sales      ║
╚════════════╩══════════╩═════════╩════════════╝

由于xml文件的层次结构,我生成了大量重复的条目(例如,您看到用户的3倍&#34; Frank&#34;或销售部门的5倍)。我的猜测是我的数据中有75%是重复数据。这就是为什么我真的想要将数据转换回真正的关系数据库并删除所有重复的行,同时保持关系。

有没有人知道该怎么做?

我感谢任何帮助, 乌利

0 个答案:

没有答案