在perl中使用多线程的csv格式问题

时间:2014-08-13 13:18:23

标签: multithreading perl

我正在运行一个由30个线程组成的perl脚本来运行子程序。对于每个线程,我提供100个数据。在子程序中,在代码执行了它应该的操作之后,我将输出存储在csv文件中。但是,我发现在执行时,csv文件有一些重叠的数据。例如,在csv文件中,我以这种方式存储姓名,年龄,性别,国家/地区 -

print OUTPUT $name.",".$age.",".$gender.",".$country.",4\n";

csv文件应该具有输出 -

Randy,35,M,USA,4
Tina,76,F,UK,4

但是,在csv文件中,我看到有些列已经重叠或者以这种方式随意输入 -

Randy,35,M,USA,4
TinaMike,76,UK
23,F,4

是因为某些线程同时执行吗?我该怎么做才能避免这种情况?我只是在获取数据后才使用print语句。有什么建议吗?

4是组ID,它将保持不变。

以下是代码段:

#!/usr/bin/perl

use DBI;
use strict;
use warnings;
use threads;
use threads::shared;

my $host = "1.1.1.1";
my $database = "somedb";
my $user = "someuser";
my $pw = "somepwd";

my @threads;


open(PUT,">/tmp/file1.csv") || die "can not open file";
open(OUTPUT,">/tmp/file2.csv") || die "can not open file";

my $dbh = DBI->connect("DBI:mysql:$database;host=$host", $user, $pw ,) || die "Could not connect to database: $DBI::errstr";
$dbh->{'mysql_auto_reconnect'} = 1;

my $sql = qq{
    //some sql to get a primary keys
};

my $sth = $dbh->prepare($sql);
$sth->execute();
while(my @request = $sth->fetchrow_array())
{
#get other columns and print to file1.csv
            print PUT $net.",".$sub.",4\n";
            $i++; #this has been declared before
}


for ( my $count = 1; $count <= 30; $count++) {
        my $t = threads->new(\&sub1, $count);
        push(@threads,$t);
}
foreach (@threads) {
        my $num = $_->join;
        print "done with $num\n";
}

sub sub1 {
        my $num = shift;

        //calculated start_num and end_num based on an internal logic

        for(my $x=$start_num; $x<=$end_num; $x++){

                print OUTPUT $name.",".$age.",".$gender.",".$country.",4\n";
                $j++; #this has been declared before
            }

        sleep(1);
        return $num;
}

我在file2中遇到问题,它有OUTPUT处理程序

2 个答案:

答案 0 :(得分:4)

您正在多线程并从多个线程打印到文件。这将永远结束 - 打印不是'原子'操作,因此不同的打印可以互相打断。

您需要做的是序列化您的输出,这样就不会发生这种情况。最简单的方法是使用锁或信号量:

    my $print_lock : shared;

    { 
        lock $print_lock; 
        print OUTPUT $stuff,"\n";
    }

当'锁定'超出范围时,它将被释放。

或者,有一个单独的线程'做'文件IO,并使用Thread::Queue向其提供行。取决于您是否需要任何“OUTPUT”内容的订购/处理。

类似的东西:

    use Thread::Queue;

    my $output_q = Thread::Queue -> new();


    sub output_thread {
      open ( my $output_fh, ">", "output_filename.csv" ) or die $!; 

       while ( my $output_line = $output_q -> dequeue() ) {
          print {$output_fh} $output_line,"\n"; 
       }

       close ( $output_fh );


     sub doing_stuff_thread {
        $output_q -> enqueue ( "something to output" );  #\n added by sub!
     }


     my $output_thread = threads -> create ( \&output_thread );
     my $doing_stuff_thread = threads -> create ( \&doing_stuff_thread );

     #wait for doing_stuff to finish - closing the queue will cause output_thread to flush/exit. 
     $doing_stuff_thread -> join();
     $output_q -> end;
     $output_thread -> join();

答案 1 :(得分:2)

全局打开File句柄,然后尝试在文件句柄上使用flock,如下所示:

sub log_write {
    my $line = shift;
    flock(OUTPUT, LOCK_EX)      or die "can't lock: $!";
    seek(OUTPUT, 0, SEEK_END)   or die "can't fast forward: $!";
    print OUTPUT $line;
    flock(OUTPUT, LOCK_UN)      or die "can't unlock: $!";
}

其他例子: