Question

我正在运行一个由30个线程组成的perl脚本来运行子程序。对于每个线程，我提供100个数据。在子程序中，在代码执行了它应该的操作之后，我将输出存储在csv文件中。但是，我发现在执行时，csv文件有一些重叠的数据。例如，在csv文件中，我以这种方式存储姓名，年龄，性别，国家/地区 -

print OUTPUT $name.",".$age.",".$gender.",".$country.",4\n";

csv文件应该具有输出 -

Randy,35,M,USA,4
Tina,76,F,UK,4

等

但是，在csv文件中，我看到有些列已经重叠或者以这种方式随意输入 -

Randy,35,M,USA,4
TinaMike,76,UK
23,F,4

是因为某些线程同时执行吗？我该怎么做才能避免这种情况？我只是在获取数据后才使用print语句。有什么建议吗？

4是组ID，它将保持不变。

以下是代码段：

#!/usr/bin/perl

use DBI;
use strict;
use warnings;
use threads;
use threads::shared;

my $host = "1.1.1.1";
my $database = "somedb";
my $user = "someuser";
my $pw = "somepwd";

my @threads;


open(PUT,">/tmp/file1.csv") || die "can not open file";
open(OUTPUT,">/tmp/file2.csv") || die "can not open file";

my $dbh = DBI->connect("DBI:mysql:$database;host=$host", $user, $pw ,) || die "Could not connect to database: $DBI::errstr";
$dbh->{'mysql_auto_reconnect'} = 1;

my $sql = qq{
    //some sql to get a primary keys
};

my $sth = $dbh->prepare($sql);
$sth->execute();
while(my @request = $sth->fetchrow_array())
{
#get other columns and print to file1.csv
            print PUT $net.",".$sub.",4\n";
            $i++; #this has been declared before
}


for ( my $count = 1; $count <= 30; $count++) {
        my $t = threads->new(\&sub1, $count);
        push(@threads,$t);
}
foreach (@threads) {
        my $num = $_->join;
        print "done with $num\n";
}

sub sub1 {
        my $num = shift;

        //calculated start_num and end_num based on an internal logic

        for(my $x=$start_num; $x<=$end_num; $x++){

                print OUTPUT $name.",".$age.",".$gender.",".$country.",4\n";
                $j++; #this has been declared before
            }

        sleep(1);
        return $num;
}

我在file2中遇到问题，它有OUTPUT处理程序

Answer 1

您正在多线程并从多个线程打印到文件。这将永远结束 - 打印不是'原子'操作，因此不同的打印可以互相打断。

您需要做的是序列化您的输出，这样就不会发生这种情况。最简单的方法是使用锁或信号量：

    my $print_lock : shared;

    { 
        lock $print_lock; 
        print OUTPUT $stuff,"\n";
    }

当'锁定'超出范围时，它将被释放。

或者，有一个单独的线程'做'文件IO，并使用Thread::Queue向其提供行。取决于您是否需要任何“OUTPUT”内容的订购/处理。

类似的东西：

    use Thread::Queue;

    my $output_q = Thread::Queue -> new();


    sub output_thread {
      open ( my $output_fh, ">", "output_filename.csv" ) or die $!; 

       while ( my $output_line = $output_q -> dequeue() ) {
          print {$output_fh} $output_line,"\n"; 
       }

       close ( $output_fh );


     sub doing_stuff_thread {
        $output_q -> enqueue ( "something to output" );  #\n added by sub!
     }


     my $output_thread = threads -> create ( \&output_thread );
     my $doing_stuff_thread = threads -> create ( \&doing_stuff_thread );

     #wait for doing_stuff to finish - closing the queue will cause output_thread to flush/exit. 
     $doing_stuff_thread -> join();
     $output_q -> end;
     $output_thread -> join();

Answer 2

全局打开File句柄，然后尝试在文件句柄上使用flock，如下所示：

sub log_write {
    my $line = shift;
    flock(OUTPUT, LOCK_EX)      or die "can't lock: $!";
    seek(OUTPUT, 0, SEEK_END)   or die "can't fast forward: $!";
    print OUTPUT $line;
    flock(OUTPUT, LOCK_UN)      or die "can't unlock: $!";
}

其他例子：

perlfaq5 - I still don't get locking. I just want to increment the number in the file. How can I do this?

在perl中使用多线程的csv格式问题

2 个答案: