用于动态文件的Perl文件解析器

时间:2013-10-02 08:07:14

标签: perl parsing

我是Perl的新手,可以真正使用一些帮助制作文件解析器。 该文件是这样构建的(X是一个从一个文件更改为文件的数字,并提供包含列标题的以下行数):

X,1,0,0,2,0,0,2,0,1,2,0,2,2,0,3,2,0,4,2,1,0,2,2,0,2,3,0,2,4,0,2,4,1,2,4,2,2,4,3,2,5,0,2,5,1,2,5,2,2,5,3,3,1,0,3
# Col_heading1
# Col_heading2
# Col_heading3 //Continues X rows
# Col_headingX 
# 2013 138 22:42:21 - Random text
# 2013 138 22:42:22 : Random text
# 2013 138 22:42:23 : Random text
2013 138 22:42:26, 10, 10, 10, 20, //continues X values
2013 138 22:42:27, 10, 10, 10, 20, 
2013 138 22:42:28, 10, 10, 10, 20, 
# 2013 138 22:42:31 - Random text
# 2013 138 22:42:32 : Random text
# 2013 138 22:42:33 - Event $eventname starting ($eventid) //$eventname and $eventid changes for each file
2013 138 22:42:35, 10, 10, 10, 20, 
2013 138 22:42:36, 10, 10, 10, 20, 
2013 138 22:42:37, 10, 10, 10, 20, 
2013 138 22:42:38, 10, 10, 10, 20, 
2013 138 22:42:39, 10, 10, 10, 20, 
# 2013 138 22:42:40 : Random text
2013 138 22:42:41, 10, 10, 10, 20, 
2013 138 22:42:42, 10, 10, 10, 20, 
# 2013 138 22:42:45 - Event $eventname ended ($eventid) //$eventname and $eventid changes for each file
2013 138 22:42:46, 10, 10, 10, 20, 
2013 138 22:42:47, 10, 10, 10, 20, 
# 2013 138 22:42:48 : Random text

解析器需要将Col_headings转置为一行上的制表符分隔值,并列出{#1}}和# 2013 138 22:42:33 - Event $eventname starting ($eventid)之间不以#开头的所有行。 还必须将值从逗号分隔更改为制表符分隔。

输出文件应如下所示:

# 2013 138 22:42:45 - Event $eventname ended ($eventid)

非常感谢任何帮助!

1 个答案:

答案 0 :(得分:1)

打开文件后,您可以从第一行获取编号:

my ($heading_count) = split /,/, <$fh>;

然后循环获取标题:

my @headings = qw(Time);
for (1..$heading_count) {
    chomp(my $heading = <$fh>); # Chomp to remove the newline
    # Process it somehow, e.g. remove leading # + whitespace
    $heading =~ s/^#\s+//;
    push @headings, $heading;
}

完成后,循环遍历文件的其余部分,解析并打印开始/结束模式之间的任何行。这是一个相当简单的例子,可以帮助您入门:

print join "\t", @headings, "\n"; # print out the headings
my $in_event = 0; # State variable to track if we're in an event
while(<DATA>) {
    if (/Event (.*) starting \((.*)\)/) { # Watch for the event starting, event name is now in $1, event id in $2
        $in_event = 1;
        next;
    }
    next unless $in_event; # Skip if not in an event yet
    last if /Event .* ended/; # Stop reading if the event ends
    next if /^#/; # Skip comments

    s/,\s?/\t/g; # Replace commas with tabs
    print; # Print the row
}

你会发现使用这种方法,由于长度可变,列标题没有与数据正确排列,所以你需要调整它以获得所需的内容或查看Text::CSV解析行(或使用split)和Text::Table之类的东西来生成一个合适的表。