Perl和Regex - 从.csv解析值

时间:2014-12-02 13:52:21

标签: regex perl

我需要创建一个perl脚本来读取给定文件夹中的最后一个修改过的文件(该文件始终是.csv)并解析其列中的值,因此我可以将它们控制到mysql数据库

主要问题是:我需要将日期与小时分开,国家与名称分开(CHN,DEU和JPN代表中国,德国和日本)。

他们像下面的例子一样聚集在一起:

"02/12/2014 09:00:00","3600","1","CHN - NAME1","0%","0%"
"02/12/2014 09:00:00","3600","1","DEU - NAME2","10%","75.04%"
"02/12/2014 09:00:00","3600","1","JPN - NAME3","0%","100%"

到目前为止,我可以拆分这些行,但是如何让它理解将""中的,my %date; my %hour; my %country; my %name; my %percentage_one; my %percentage_two; # Selects lastest file in the given directory my $files = File::DirList::list('/home/cvna/IN/SCRIPTS/zabbix/roaming/tratamento_IAS/GPRS_IN', 'M'); my $file = $files->[0]->[13]; open(CONFIG_FILE,$file); while (<CONFIG_FILE>){ # Splits the file into various lines @lines = split(/\n/,$_); # For each line that i get... foreach my $line (@lines){ # I need to split the values between , without the "" # And separating Hour from Date, and Name from Country @aux = split(/......./,$line) } } close(CONFIG_FILE); 分隔的每个值都插入到我的数组中?

{{1}}

2 个答案:

答案 0 :(得分:5)

readline<>只读一行。在换行符上没有必要split。但是,请使用Text::CSV

,而不是修复代码
#!/usr/bin/perl
use 5.010;
use warnings;
use strict;

use Text::CSV;

my $csv = 'Text::CSV'->new({ binary => 1 }) or die 'Text::CSV'->error_diag;

while (my $row = $csv->getline(*DATA)) {
    my ($date, $time)    = split / /,   $row->[0];
    my ($country, $name) = split / - /, $row->[3];
    print "Date: $date\tTime: $time\tCountry: $country\tName: $name\n";
}

__DATA__
"02/12/2014 09:00:00","3600","1","CHN - NAME1","0%","0%"
"02/12/2014 09:00:00","3600","1","DEU - NAME2","10%","75.04%"
"02/12/2014 09:00:00","3600","1","JPN - NAME3","0%","100%"

答案 1 :(得分:1)

查看您的代码,您似乎对Perl来说是一个新手。 Text::CSV模块是一个很好的解决方案,但不幸的是,它不是一个标准模块。您需要使用CPAN进行安装。这并不困难,但可能要求您成为计算机的管理员。

模块Text::ParseWords是一个标准模块,可以像Text::CSV一样处理引用的单词。

你需要基本上分割线(我使用parse_line功能)。第一个参数是,,这是我想要分割我的线。与split本身不同,parse_line不会对引用的参数进行拆分,并处理反向引用的引号。这与Text::CSV非常相似。

一旦拆分行,您就需要从时间和国家/地区划分日期。在我的示例中,我展示了两种方法:一种使用split,另一种使用匹配的正则表达式。任何一个都可以工作。

use strict;             # Lets you know when you misspell variable names
use warnings;           # Warns of issues (using undefined variables
use feature qw(say);    # Let's you use 'say' instead of 'print' (No \n needed)
use Text::ParseWords;

while ( my $line = <DATA> ) {
    my ($date_time, $foo, $bar, $country_name, $percent1, $percent2)
            = parse_line ',', 0, $line;
    my ($date, $time) = split /\s+/, $date_time;
    my ($country, $name) = $country_name =~ m/(.+) - (.*)/;
    say "$date, $time, $country, $name";
}

__DATA__
"02/12/2014 09:00:00","3600","1","CHN - NAME1","0%","0%"
"02/12/2014 09:00:00","3600","1","DEU - NAME2","10%","75.04%"
"02/12/2014 09:00:00","3600","1","JPN - NAME3","0%","100%"

在您的实际计划中,您将打开您的文件,并确保您已打开该文件。您可以测试它,或use autodie

use strict;             # Lets you know when you misspell variable names
use warnings;           # Warns of issues (using undefined variables
use feature qw(say);    # Let's you use 'say' instead of 'print' (No \n needed)
use Text::ParseWords;
use autodie;

open my $config_file, "<", $file;  # No need for testing thanks to use autodie!

# What you need to do if you don't use autodie
# open my $config_file, "<", $file or die qq(Can't open "$file" for reading);

while ( my $line = <$config_file> ) {
    my ($date_time, $foo, $bar, $country_name, $percent1, $percent2)  
            = parse_line ',', 0, $line;
    my ($date, $time) = split /\s+/, $date_time;
    my ($country, $name) = $country_name =~ m/(.+) - (.*)/;
    say "$date, $time, $country, $name";  # Show fields were correctly parsed.
}

看起来你想要存储数据,我看到你有多个哈希,我打赌你试图保持并行。看看如何使用references来构建更复杂的结构:

my %data;   #Where I'll be storing the data...
$data{$key}->{DATE} = $date;
$data{$key}->{HOUR} = $hour;
$data{$key}->{COUNTRY} = $country;
...

现在,您的所有数据都在%data。您可以在程序中从一个地方传递到另一个地方,不用担心您是否更新了每个哈希值。

一旦你掌握了参考文献,你就可以编写Object Oriented Perl代码了。

获得一本关于Modern Perl的好书。自Perl 5发布以来,Perl编码技术发生了很大的变化。不幸的是,大多数人从来都不会学习Perl应该编写的方式,因为他们从周围的旧书中学习,或者从查看Perl 3和Perl 4错误(双关语)的旧代码中学习。 Perl是一种灵活而强大的语言,可以让您快速生成足够的绳索来挂起自己。学习优秀的编程技术将使您能够编写更复杂,更全面的程序,这些程序实际上更易于阅读和维护。


几乎完整的程序......

这是查找特定目录中最新文件的完整程序,然后读入该文件并解析这些行。

我正在使用-M file test。此文件测试返回文件的上次修改时间,表示为自程序运行以来的文件年龄。例如,2个半天前上次修改的文件将返回2.5,而上一次和前一个小时前修改的文件将返回1.16666667。您可以使用它来比较各种文件的年龄。

这个程序适用于Perl 5.8.8而无需安装任何新模块,我已经用我已编制的数据对其进行了测试。

您可以看到我使用&#34; open ... or die ...;没有任何问题。你有其他错误吗?您的计划中是否设置了use strict;use warnings;

#! /usr/bin/env perl
#

use strict;             # Lets you know when you misspell variable names
use warnings;           # Warns of issues (using undefined variables
use Text::ParseWords;
use Benchmark;

use constant {
    DATA_FILE_DIR => "temp",
};

#
# Find newest file in the directory
#

opendir my $data_dir, DATA_FILE_DIR
        or die qq(Cannot open directory for reading.);

my $newest_file;
while ( my $file = readdir $data_dir ) { 
    next if $file eq "." or $file eq "..";
    my $full_name = DATA_FILE_DIR . "/" . $file;
    if ( not defined $newest_file
            or -M $full_name < -M $newest_file ) {
        $newest_file = $full_name;
    }
}
print qq(Using file is "$newest_file"\n);
closedir $data_dir;

open my $file, "<", $newest_file
        or die qq(Cannot open file "$newest_file" for reading.);
while ( my $line = <$file> ) {
    # Read in the entire line
    my ($date_time, $foo, $bar, $country_name, $percent1, $percent2) 
            = parse_line ',', 0, $line;
    # Split the DATE/TIME field
    my ($date, $time) = split /\s+/, $date_time;

    # Split the Country/Name field
    my ($country, $name) = $country_name =~ m/(.+) - (.*)/;

    # Print statement merely shows that these four fields are truly split.
    print "$date, $time, $country, $name\n";
}