Question

我有一个源文件关系的目标文件，如下所示

--
OBJECT_FILE: F:/XX/YY/ZZ/OperatingSystem.o  e4cd09e5fc1c74ec6a2e24c361f7103d3a4036a2    5
    F:/XX/YY/ZZ/OperatingSystem.cpp ba06447d296ceae294962bbf130406052ebb9d7c    2
    F:/XX/YY/ZZ/OperatingSystem.hpp     272b23c2590b2f2e908b9f7e148a1dfcb61183d8    4
    F:/XX/YY/ZZ/Types.h     2375eeec03d837b351a0c105e663dcea6aee434d    3
    F:/XX/YY/ZZ/time.h      f7a9165daf21d6f200ad656fbace652fcde11c4b    3
    F:/XX/YY/ZZ/cdefs.h     f0704e779b9252398f7859a05cc65bbd563cdd0e    3
    F:/XX/YY/ZZ/cdefs_elf.h     63e208b3b175f84e32918d08096693688d860869    3
    F:/XX/YY/ZZ/time.h      f4eaf411f6b1a8817f3baed1bfbacd4ea2f51b9f    3
    F:/XX/YY/ZZ/cdefs.h     f0704e779b9252398f7859a05cc65bbd563cdd0e    3
OBJECT_FILE: F:/XX/YY/ZZ/CIpInterface.o e4cd09e5fc1c74ec6a2e24c361f7103d3a4036a2    5
    F:/XX/YY/ZZ/CIpInterface.cpp    ba06447d296ceae294962bbf130406052ebb9d7c    2
    F:/XX/YY/ZZ/OperatingSystem.hpp     272b23c2590b2f2e908b9f7e148a1dfcb61183d8    4
    F:/XX/YY/ZZ/Types.h     2375eeec03d837b351a0c105e663dcea6aee434d    3
    F:/XX/YY/ZZ/time.h      f7a9165daf21d6f200ad656fbace652fcde11c4b    3
    F:/XX/YY/ZZ/cdefs.h     f0704e779b9252398f7859a05cc65bbd563cdd0e    3
    F:/XX/YY/ZZ/cdefs_elf.h     63e208b3b175f84e32918d08096693688d860869    3
    F:/XX/YY/ZZ/time.h      f4eaf411f6b1a8817f3baed1bfbacd4ea2f51b9f    3
    F:/XX/YY/ZZ/cdefs.h     f0704e779b9252398f7859a05cc65bbd563cdd0e    3
    F:/XX/YY/ZZ/malloc.h        f0704e779b9252398f7859a05cc65bbd563cdd0e    3
    F:/XX/YY/ZZ/stddef.h        f0704e779b9252398f7859a05cc65bbd563cdd0e    3
--

以上信息在文本文件中，我有一个perl解析器它将逐行读取文本文件并尝试插入数据库。对象（.o）到源（.cpp / c / h / hpp）文件关系是在单独的多对多关系（映射）表中建立的。

代码摘要如下

读取OBJECT_FILE：行，将其插入PK，SHA1校验和为唯一。
取其PK.say PK1
读取第二行，即OperatingSystem.cpp插入它。如果错误，请检查错误代码
如果errocode是重复条目，或者没有错误 - 获取其PK，请说PK2
5。在多对多关系表中插入PK1，PK2，如果错误，错误代码是重复条目 - 忽略它，如果有任何其他错误退出。

示例代码如下

--
            $sth = $dbh->prepare("INSERT INTO source_r_binarylist_table
                                (   Source_r_Binary_Id,
                                    Source_r_Binary_name,
                                    Source_r_Binary_FileName,
                                    Source_r_Binary_Version,
                                    FileTypeId,
                                    SourceofSource_r_Binary,
                                    Source_r_Binary_Supplier,
                                    Source_r_Binary_Originator,
                                    Source_r_Binary_Home_Page,
                                    Source_r_Binary_Download_Location,
                                    Source_r_Binary_Checksum,
                                    Source_r_Binary_Verification_Code,
                                    Excluded_Files,
                                    Source_Info,
                                    License_Concluded,
                                    LicenseIdsFromAllFiles,
                                    LicenseComments,
                                    Summary,
                                    Description,
                                    Technology_category
                                )
                                VALUES
                                    (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)");
            $error_code =0;
             $sth->execute(undef,undef,$file_name,undef,$file_type,undef,undef,undef,undef,undef,$sha1_chksum,undef,undef,undef,undef,undef,undef,undef,undef,undef)
             or $error_code = $sth->err;
            #print "Error code: $error_code ; return value $DBI::state \n";

            if (($error_code != 0) && ($error_code != ERROR_CODE_DUP_ENTRY))
            {
                die "File:[".__FILE__."] Line:[".__LINE__."]:ERROR: refer DB error code, and take appropriate action".$DBI::errstr;
            }

            $sth->finish();
            #$dbh->commit or die $DBI::errstr;

            # If first time entered or same file is entered again, the checksum will remain same, hence Duplicate entry error is thrown ,its error code is 1062
            if((($error_code == 0) && ($file_type == $FILE_TYPE_OBJ)) ||(($error_code == ERROR_CODE_DUP_ENTRY) && ($file_type == $FILE_TYPE_OBJ)))
            {
                $sth = $dbh->prepare("SELECT Source_r_Binary_Id from source_r_binarylist_table where Source_r_Binary_Checksum = ?");
                $sth->execute( $sha1_chksum ) or die "File:[".__FILE__."] Line:[".__LINE__."]:ERROR:".$DBI::errstr;;
                while (my @row = $sth->fetchrow_array())
                {
                        $src_bin_id_for_obj = $row[0];
                        #print "Source Bin Id = $src_bin_id_for_obj\n";
                }
                $sth->finish();
            }

            if((($error_code == 0) || ($error_code == ERROR_CODE_DUP_ENTRY))&& 
                (($file_type == $FILE_TYPE_SRC_C) || ($file_type == $FILE_TYPE_SRC_CPP)|| 
                ($file_type == $FILE_TYPE_SRC_H) || ($file_type == $FILE_TYPE_SRC_HPP) || 
                ($file_type == $FILE_TYPE_SRC_JAVA)))
            {
                    # get the Id of newly added source (cpp or c or h or hpp or java)
                $sth = $dbh->prepare("SELECT Source_r_Binary_Id from source_r_binarylist_table where Source_r_Binary_Checksum = ?");
                $sth->execute( $sha1_chksum ) or die "File:[".__FILE__."] Line:[".__LINE__."]:ERROR:".$DBI::errstr;

                while (my @row = $sth->fetchrow_array()) {
                        $src_bin_id_for_src = $row[0];
                        #print "Source Bin Id = $src_bin_id_for_obj\n";
                }
                $sth->finish();


                $error_code_sub =0;
                #add an entry in to junction table
                $sth = $dbh->prepare(" INSERT INTO object_source_id_junctiontable (Object_Id_ref,Source_Id_ref) VALUES (?,?)");
                $sth->execute( $src_bin_id_for_obj, $src_bin_id_for_src) or  ($error_code_sub = $DBI::err);

                if (($error_code_sub != 0) && ($error_code_sub != ERROR_CODE_DUP_ENTRY))
                {
                    die "File:[".__FILE__."] Line:[".__LINE__."]:ERROR: $error_code_sub refer DB error code, and take appropriate action: ".$DBI::errstr;
                }
                $sth->finish();

                print "Line:[".__LINE__."]:Insertion successfull [Obj Id: $src_bin_id_for_obj] [Source Id: $src_bin_id_for_src]\n";
            }
    }
--

在上面的代码中存在巨大的性能问题，对于每行的插入，需要一秒钟。想象一下，如果输入文件有大约800000行，那么完成脚本执行大约需要10天。

如果可以进行DB中所需的更改，请指导减少插入时间的最佳方法。

Answer 1

请提供SHOW CREATE TABLE表格。我特别担心表的INDEX（es）和ENGINE。此外，桌子有多大。

假设 Source_r_Binary_Id是AUTO_INCREMENT，您可以使用LAST_INSERT_ID或insert_id获取ID。这比你正在使用的SELECT快一点。但是，我看到同样发生的SELECT看起来像什么;是吗？

请将问题归结为一系列SQL语句;很难通过Perl（？）代码。

请提供`SHOW VARIABLES LIKE'％buffer％';你有多少RAM？

在sha1上使用UNIQUE密钥是致命的，但仍然不应该花费一秒钟来完成2次插入和2次选择，每次都是单行并编制索引。

阿。如果您已经“索引每一列”，那么大部分时间都需要INSERT。

建议您检测代码（可能使用Time :: Hires）来查看哪些部分花费的时间最长。

（在您提供上述信息后，我可能会有更多建议。）

sQL查询性能问题（多对多关系表）

代码摘要如下

5。在多对多关系表中插入PK1，PK2，如果错误，错误代码是重复条目 - 忽略它，如果有任何其他错误退出。

1 个答案: