我想使用Perl解析文本文件。此文本文件包含一些HTML文件的日志,如下所示:
class A {
class ReadyHandler { // fires off the callback when needed
let callback;
init(callback: ()->Void) {
self.callback = callback
}
}
let readyHandler: ReadyHandler
let ready = false
init() {
readyHandler = ReadyHandler(callback: {self.ready = true})
}
}
每行包含一个错误号及其描述。
解析后,EXPECTED OUTPUT如下:
Details from /projects/git/Changelog.html file:
NEW_FEATURES: <a href="http://jira.xyz.com/browse/JIRA-4208">JIRA-4208</a><span style='mso-spacerun:yes'> </span>Add New Config C support in code
BUG_FIX: <a href="http://jira.xyz.com/browse/BUGJIRA-31">BUGJIRA-31</a><span style='mso-spacerun:yes'> </span>Bugfix of some old bug
NEW_FEATURES: <a href="http://jira.xyz.com/browse/ZEERA-273">ZEERA-273</a><span style='mso-spacerun:yes'> </span>Add support for some other feature.
Details from /projects/git/Changelog2.html file:
BUG_FIX: <a href="http://jira.xyz.com/browse/BUGJIRA-33">BUGJIRA-33</a><span style='mso-spacerun:yes'> </span>Bugfix of an issue
NEW_FEATURES: <a href="http://jira.xyz.com/browse/JIRA-4209">JIRA-4209</a><span style='mso-spacerun:yes'> </span>Add New Config D support in code
即。所有错误编号后跟其描述。
如果可能,我想将输出写在另一个文件JIRA-4208, BUGJIRA-31, ZEERA-273, BUGJIRA-33, JIRA-4209 : Add New Config C support in code, Bugfix of some old bug, Add support for some other feature, Bugfix of an issue, Add New Config D support in code
EDIT-1:
我的代码如下:
output.txt
输出是:
#!/usr/bin/perl
open (FILE, 'input_file1.txt') or die "Could not read from file, exit...";
while(<FILE>)
{
chomp;
($junk0,$junk1,$junk2,$junk3,$junk4,$BUG_NUMBR) = split /[:<="">]+/,$_;
print "$BUG_NUMBR \n";
}
close FILE;
exit;
这与上面显示的预期输出完全不同。我无法为预期输出的第二部分定义适当的正则表达式,这是对错误的简短描述。
答案 0 :(得分:0)
您不需要正则表达式。您的split
模式很有趣,但它可以完成工作。
也可以采取其余的结果。我用数组替换了你的$junk
变量。 Perl允许您使用索引-1
从右侧获取最后一个元素,因此将文本输出是微不足道的,因为它是在最后一个>
之后。
use strict;
use warnings;
my ( @numbers, @text );
while (my $line = <DATA>) {
chomp $line;
my @stuff = split /[:<="">]+/, $line;
push @numbers, $stuff[5];
push @text, $stuff[-1];
}
print join ', ', @numbers;
print ' : ';
print join ', ', @text;
__DATA__
NEW_FEATURES: <a href="http://jira.xyz.com/browse/JIRA-4208">JIRA-4208</a><span style='mso-spacerun:yes'> </span>Add New Config C support in code
BUG_FIX: <a href="http://jira.xyz.com/browse/BUGJIRA-31">BUGJIRA-31</a><span style='mso-spacerun:yes'> </span>Bugfix of some old bug
NEW_FEATURES: <a href="http://jira.xyz.com/browse/ZEERA-273">ZEERA-273</a><span style='mso-spacerun:yes'> </span>Add support for some other feature.
BUG_FIX: <a href="http://jira.xyz.com/browse/BUGJIRA-33">BUGJIRA-33</a><span style='mso-spacerun:yes'> </span>Bugfix of an issue
NEW_FEATURES: <a href="http://jira.xyz.com/browse/JIRA-4209">JIRA-4209</a><span style='mso-spacerun:yes'> </span>Add New Config D support in code
我还添加了严格和警告,并使你的变量有词汇。
另请注意,如果文字包含文字>
或<
或引号或其他内容,您的代码就会中断。这是一种奇怪的格式,而HTML解析器并不能真正帮助你。
答案 1 :(得分:0)
上面提到的问题陈述的代码如下:
#!/usr/bin/perl
use strict;
use warnings;
open (FILE, 'perl_input_file1.txt') or die $!;
my ( @numbers, @text );
while (my $line = <FILE>) {
chomp $line;
$line =~ /^Details/ and next;
my @stuff = split /[:<="">]+/, $line;
push @numbers, $stuff[5];
push @text, $stuff[-1];
}
close FILE;
print join ', ', @numbers;
print ': ';
print join ', ', @text;
print "\n";
此代码的输出为:
JIRA-4208, BUGJIRA-31, ZEERA-273, BUGJIRA-33, JIRA-4209: Add New Config C support in code, Bugfix of some old bug, Add support for some other feature, Bugfix of an issue, Add New Config D support in code
这与问题中提到的我期望的预期输出相同。
我想再次感谢@simbabque的指导和方法。
此致