我正在运行一个由30个线程组成的perl脚本来运行子程序。对于每个线程,我提供100个数据。在子程序中,在代码执行了它应该的操作之后,我将输出存储在csv文件中。但是,我发现在执行时,csv文件有一些重叠的数据。例如,在csv文件中,我以这种方式存储姓名,年龄,性别,国家/地区 -
print OUTPUT $name.",".$age.",".$gender.",".$country.",4\n";
csv文件应该具有输出 -
Randy,35,M,USA,4
Tina,76,F,UK,4
等
但是,在csv文件中,我看到有些列已经重叠或者以这种方式随意输入 -
Randy,35,M,USA,4
TinaMike,76,UK
23,F,4
是因为某些线程同时执行吗?我该怎么做才能避免这种情况?我只是在获取数据后才使用print语句。有什么建议吗?
4是组ID,它将保持不变。
以下是代码段:
#!/usr/bin/perl
use DBI;
use strict;
use warnings;
use threads;
use threads::shared;
my $host = "1.1.1.1";
my $database = "somedb";
my $user = "someuser";
my $pw = "somepwd";
my @threads;
open(PUT,">/tmp/file1.csv") || die "can not open file";
open(OUTPUT,">/tmp/file2.csv") || die "can not open file";
my $dbh = DBI->connect("DBI:mysql:$database;host=$host", $user, $pw ,) || die "Could not connect to database: $DBI::errstr";
$dbh->{'mysql_auto_reconnect'} = 1;
my $sql = qq{
//some sql to get a primary keys
};
my $sth = $dbh->prepare($sql);
$sth->execute();
while(my @request = $sth->fetchrow_array())
{
#get other columns and print to file1.csv
print PUT $net.",".$sub.",4\n";
$i++; #this has been declared before
}
for ( my $count = 1; $count <= 30; $count++) {
my $t = threads->new(\&sub1, $count);
push(@threads,$t);
}
foreach (@threads) {
my $num = $_->join;
print "done with $num\n";
}
sub sub1 {
my $num = shift;
//calculated start_num and end_num based on an internal logic
for(my $x=$start_num; $x<=$end_num; $x++){
print OUTPUT $name.",".$age.",".$gender.",".$country.",4\n";
$j++; #this has been declared before
}
sleep(1);
return $num;
}
我在file2中遇到问题,它有OUTPUT处理程序
答案 0 :(得分:4)
您正在多线程并从多个线程打印到文件。这将永远结束 - 打印不是'原子'操作,因此不同的打印可以互相打断。
您需要做的是序列化您的输出,这样就不会发生这种情况。最简单的方法是使用锁或信号量:
my $print_lock : shared;
{
lock $print_lock;
print OUTPUT $stuff,"\n";
}
当'锁定'超出范围时,它将被释放。
或者,有一个单独的线程'做'文件IO,并使用Thread::Queue
向其提供行。取决于您是否需要任何“OUTPUT”内容的订购/处理。
类似的东西:
use Thread::Queue;
my $output_q = Thread::Queue -> new();
sub output_thread {
open ( my $output_fh, ">", "output_filename.csv" ) or die $!;
while ( my $output_line = $output_q -> dequeue() ) {
print {$output_fh} $output_line,"\n";
}
close ( $output_fh );
sub doing_stuff_thread {
$output_q -> enqueue ( "something to output" ); #\n added by sub!
}
my $output_thread = threads -> create ( \&output_thread );
my $doing_stuff_thread = threads -> create ( \&doing_stuff_thread );
#wait for doing_stuff to finish - closing the queue will cause output_thread to flush/exit.
$doing_stuff_thread -> join();
$output_q -> end;
$output_thread -> join();
答案 1 :(得分:2)
全局打开File句柄,然后尝试在文件句柄上使用flock
,如下所示:
sub log_write {
my $line = shift;
flock(OUTPUT, LOCK_EX) or die "can't lock: $!";
seek(OUTPUT, 0, SEEK_END) or die "can't fast forward: $!";
print OUTPUT $line;
flock(OUTPUT, LOCK_UN) or die "can't unlock: $!";
}
其他例子: