Perl用mysql,非常慢,如何加速

时间:2012-03-03 14:20:11

标签: mysql performance perl

unit
id fir_name sec_name
author
id name unit_id
author_paper
id author_id paper_id

我想统一作者['同一作者'意味着名称相同且单位'fir_names相同],我必须同时更改author_paper表。

这就是我的所作所为:

$conn->do('create index author_name on author (name)');
my $sqr = $conn->prepare("select name from author group by name having count(*) > 1");
$sqr->execute();
while(my @row = $sqr->fetchrow_array()) {
  my $dup_name = $row[0];
  $dup_name = formatHtml($dup_name);
    my $sqr2 = $conn->prepare("select id, unit_id from author where name = '$dup_name'");
    $sqr2->execute();

    my %fir_name_hash = ();
    while(my @row2 = $sqr2->fetchrow_array()) {
        my $author_id = $row2[0];
        my $unit_id = $row2[1];
        my $fir_name = getFirNameInUnit($conn, $unit_id);
        if (not exists $fir_name_hash{$fir_name}) {
            $fir_name_hash{$fir_name} = []; #anonymous arr reference
        }
        $x = $fir_name_hash{$fir_name};
        push @$x, $author_id;
    }

    while(my ($fir_name, $author_id_arr) = each(%fir_name_hash)) {
        my $count = scalar @$author_id_arr;
        if ($count == 1) {next;}
        my $author_id = $author_id_arr->[0];
        for ($i = 1; $i < $count; $i++) {
            #print "$author_id_arr->[$i] => $author_id\n";
            unifyAuthorAndAuthorPaperTable($conn, $author_id, $author_id_arr->[$i]); #just delete in author table, and update in author_paper table 
        }
    }
}

从作者中选择计数(*); #240000 从作者中选择count(distinct(name)); #7,7000 它非常慢!!我已经运行了5个小时,它只删除了大约4,0000个重复名称。 如何让它运行得更快。我渴望得到你的建议

2 个答案:

答案 0 :(得分:8)

您不应在循环中准备第二个sql语句,并且在使用?占位符时可以实际使用该准备工作:

$conn->do('create index author_name on author (name)');

my $sqr = $conn->prepare('select name from author group by name having count(*) > 1');

# ? is the placeholder and the database driver knows if its an integer or a string and 
# quotes the input if needed.
my $sqr2 = $conn->prepare('select id, unit_id from author where name = ?');

$sqr->execute();
while(my @row = $sqr->fetchrow_array()) {
  my $dup_name = $row[0];
  $dup_name = formatHtml($dup_name);

    # Now you can reuse the prepared handle with different input
    $sqr2->execute( $dup_name );

    my %fir_name_hash = ();
    while(my @row2 = $sqr2->fetchrow_array()) {
        my $author_id = $row2[0];
        my $unit_id = $row2[1];
        my $fir_name = getFirNameInUnit($conn, $unit_id);
        if (not exists $fir_name_hash{$fir_name}) {
            $fir_name_hash{$fir_name} = []; #anonymous arr reference
        }
        $x = $fir_name_hash{$fir_name};
        push @$x, $author_id;
    }

    while(my ($fir_name, $author_id_arr) = each(%fir_name_hash)) {
        my $count = scalar @$author_id_arr;
        if ($count == 1) {next;}
        my $author_id = $author_id_arr->[0];
        for ($i = 1; $i < $count; $i++) {
            #print "$author_id_arr->[$i] => $author_id\n";
            unifyAuthorAndAuthorPaperTable($conn, $author_id, $author_id_arr->[$i]); #just delete in author table, and update in author_paper table 
        }
    }
}

这也可以加快速度。

答案 1 :(得分:5)

当我看到一个查询和一个循环时,我认为你有一个延迟问题:你查询得到一组值,然后迭代集合做其他事情。如果这意味着集合中每行的数据库往返,那就是很多延迟。

如果您可以使用UPDATE和子选择在单个查询中执行此操作会更好,如果您可以批量处理这些请求并在一次往返中执行所有请求。

如果您明智地使用索引,您将获得额外的加速。 WHERE子句中的每一列都应该有一个索引。每个外键都应该有一个索引。

我会在你的查询上运行EXPLAIN PLAN,看看是否有任何TABLE SCAN正在进行。如果有,你必须正确索引。

我想知道一个设计合理的JOIN是否会来救你?

一个表中240,000行,另一个表中77,000行 大数据库。