我是Perl的新手,可以真正使用一些帮助制作文件解析器。 该文件是这样构建的(X是一个从一个文件更改为文件的数字,并提供包含列标题的以下行数):
X,1,0,0,2,0,0,2,0,1,2,0,2,2,0,3,2,0,4,2,1,0,2,2,0,2,3,0,2,4,0,2,4,1,2,4,2,2,4,3,2,5,0,2,5,1,2,5,2,2,5,3,3,1,0,3
# Col_heading1
# Col_heading2
# Col_heading3 //Continues X rows
# Col_headingX
# 2013 138 22:42:21 - Random text
# 2013 138 22:42:22 : Random text
# 2013 138 22:42:23 : Random text
2013 138 22:42:26, 10, 10, 10, 20, //continues X values
2013 138 22:42:27, 10, 10, 10, 20,
2013 138 22:42:28, 10, 10, 10, 20,
# 2013 138 22:42:31 - Random text
# 2013 138 22:42:32 : Random text
# 2013 138 22:42:33 - Event $eventname starting ($eventid) //$eventname and $eventid changes for each file
2013 138 22:42:35, 10, 10, 10, 20,
2013 138 22:42:36, 10, 10, 10, 20,
2013 138 22:42:37, 10, 10, 10, 20,
2013 138 22:42:38, 10, 10, 10, 20,
2013 138 22:42:39, 10, 10, 10, 20,
# 2013 138 22:42:40 : Random text
2013 138 22:42:41, 10, 10, 10, 20,
2013 138 22:42:42, 10, 10, 10, 20,
# 2013 138 22:42:45 - Event $eventname ended ($eventid) //$eventname and $eventid changes for each file
2013 138 22:42:46, 10, 10, 10, 20,
2013 138 22:42:47, 10, 10, 10, 20,
# 2013 138 22:42:48 : Random text
解析器需要将Col_headings转置为一行上的制表符分隔值,并列出{#1}}和# 2013 138 22:42:33 - Event $eventname starting ($eventid)
之间不以#开头的所有行。
还必须将值从逗号分隔更改为制表符分隔。
输出文件应如下所示:
# 2013 138 22:42:45 - Event $eventname ended ($eventid)
非常感谢任何帮助!
答案 0 :(得分:1)
打开文件后,您可以从第一行获取编号:
my ($heading_count) = split /,/, <$fh>;
然后循环获取标题:
my @headings = qw(Time);
for (1..$heading_count) {
chomp(my $heading = <$fh>); # Chomp to remove the newline
# Process it somehow, e.g. remove leading # + whitespace
$heading =~ s/^#\s+//;
push @headings, $heading;
}
完成后,循环遍历文件的其余部分,解析并打印开始/结束模式之间的任何行。这是一个相当简单的例子,可以帮助您入门:
print join "\t", @headings, "\n"; # print out the headings
my $in_event = 0; # State variable to track if we're in an event
while(<DATA>) {
if (/Event (.*) starting \((.*)\)/) { # Watch for the event starting, event name is now in $1, event id in $2
$in_event = 1;
next;
}
next unless $in_event; # Skip if not in an event yet
last if /Event .* ended/; # Stop reading if the event ends
next if /^#/; # Skip comments
s/,\s?/\t/g; # Replace commas with tabs
print; # Print the row
}
你会发现使用这种方法,由于长度可变,列标题没有与数据正确排列,所以你需要调整它以获得所需的内容或查看Text::CSV
解析行(或使用split
)和Text::Table
之类的东西来生成一个合适的表。