我已经编写了一个perl脚本,用于将数据从文本文件插入到数据库中,但我想知道如何为其添加质量检查,即我可以检查插入数据库的数据与否,即它应该显示说数据已成功插入..并且当日期已从文本插入数据库时,它只显示0000-00-00 ...要做的更改是什么...
我的代码是 -
#!/usr/bin/perl
#---------------------------------------------------------------------
# Description: Extract Lab data from text file and insert to database
#---------------------------------------------------------------------
# Modules Required
use DBI; # check drivers
#print "vs2-001-001-ma-sampleFile\n";
my $filename = "vs2-001-001-ma-sampleFile.txt";
#initialize variable $count
my $count = 0 ;
#initialise variables for parameters
my ($paraval, $paraname, $pararange, $paraunit);
#uncomment it To use keyboard input. and type filename with extension
# Ex: fileName.txt or fileName.csv
#chomp($filename=<>);
open (OUT,">>$filename.csv") || die print "No\t $!";
close OUT;
open (IN,"$filename") || die print "Noo Input. $!";
my @file=<IN>;
#join the lines with # dilimits
my $string = join('#', @file);
$string =~s /[\r]//g; # To remove space.
$string =~s /[\n]//g;
$string =~s /[\t]//g; # To remove tab
print "\n Parsing data now....\n";
# pattern under while loop will do the work.
# it will take date as 13 Oct 2010 in $1 and rest values in $2
# $string=~/Equine Profile Plus\s+#(.*?\s+)\s+.*?(Sample.*)##/g
while($string=~/Equine Profile Plus\s+#(.*?\s+)\s+.*?(Sample.*?)##/g)
{
my($date,$line,$Sample_Type,$Patient_ID, $Sample_Id,
$Doctor_Id,$Location,$Rotor, $Serial,$para,
$QC,$HEM,$LIP,$ICT);
$count++;
$date=$1;
$line=$2;
if ($line=~/Sample Type:(.*?)#/gis){
$Sample_Type=clean($1);
}if ($line=~/Patient ID:(.*?)#/gis){
$Patient_ID=clean($1);
}if ($line=~/Sample ID:(.*?)#/gis){
$Sample_Id=clean($1);
}if ($line=~/Doctor ID:(.*?)#/gis){
$Doctor_Id=clean($1);
}if ($line=~/Location:(.*?)#/gis){
$Location=clean($1);
}if ($line=~/Rotor Lot Number:(.*?)#/gis){
$Rotor=clean($1);
}if ($line=~/Serial Number:(.*?)#/gis){
$Serial=clean($1);
}if ($line=~/#(NA+.*?GLOB.*?)#/gis){
$para=$1;
$para =~ s/#/;/g;
$para =~ s/\s\s/ /g; #remove spaces.
$para =~ s/\s\s/ /g;
$para =~ s/\s\s/ /g;
$para =~ s/\s\s/ /g;
$para =~ s/\s\s/ /g;
$para =~ s/\s\s/ /g;
$para =~ s/ /:/g;
if ($line=~/#QC(.*?) #HEM(.*?) LIP(.*?) ICT(.*?) /gis){
$QC=clean($1);
$HEM=clean($2);
$LIP=clean($3);
$ICT=clean($4);
}
while($para =~ /(.*?):(.*?):(.*?);/g){
$paraname = $1;
$paraval = $2;
$pararange = $3;
#$paraunit = $4;
#data from text file written to a CSV file.
open (OUT,">>$filename.csv") || die print "No";
print OUT "\"$count\",\"$date\",\"$Sample_Type\",\"$Patient_ID\",
\"$Sample_Id\",\"$Doctor_Id\",\"$Location\",\"$Rotor\",
\"$Serial\", \"$QC\",\"$HEM\",\"$LIP\",\"$ICT\",
\"$paraname\",\"$paraval\",\"$pararange\",\n";
}
}
}
close OUT;
#Load csv into mysql
print "\n Inserting into data base \n";
# comment it while not loading into the database.
&loaddata('$filename.csv');
print "\n Database insert completed \n";
sub clean
{
my ($line) = shift (@_);
$line =~ s/\n//g;
$line =~ s/\r//g;
$line =~ s/^\s+//g;
$line =~ s/\s\s//g;
$line =~ s/\s+$//g;
$line =~ s/#//g;
return ($line);
}
#init the mysql DB
sub init_dbh{
$db="parameters";
$host="localhost";
$user="**";
$password="**";
my $dbh = DBI->connect ("DBI:mysql:database=$db:host=$host",
$user,
$password)
or die "Can't connect to database: $DBI::errstr\n";
return $dbh;
}
#Load data to mysql table
sub loaddata{
my ($name) = @_;
my $DBH = init_dbh( );
my $STH_GO = $DBH->prepare(q{
LOAD DATA LOCAL INFILE 'vs2-001-001-ma-sampleFile.txt.csv'
INTO TABLE parameter FIELDS TERMINATED BY ',' ENCLOSED BY
'"' LINES TERMINATED BY '\n'; })or die "ERROR: ". $DBI::errstr;
$STH_GO->execute();
}
答案 0 :(得分:2)
检查execute
的返回值,一件事。
答案 1 :(得分:2)
我通常以编程方式从我的代码加载数据,而不是依靠数据库来加载它。这样我就可以在插入之前验证记录。另一个优点是我知道记录是否无法插入并且可以选择尝试找出问题所在并重新插入,或者将记录推送到另一个文件以便稍后进行手动检查。
在您的代码中,您正在处理数据,然后将其推回到文件以供数据库加载。为什么不在处理数据行时加载它们?让数据库进行批量加载会更快,但不能提供良好的粒度;通常它是一个全有或全无的东西,如果它没有任何东西你的返回错误将不会告诉你很多,除了文件没有加载。
你也正在将文件啜饮到内存中,所以我建议你阅读PerlFaq 5,它在How can I read in an entire file all at once?
上有一个很好的部分。 Perl Slurp Ease
页面可能比你想知道的要多。