在以下几行中,我如何将行存储在" 描述:"和" 标记:"在使用REGEX PERL的变量中,使用什么是好的数据类型,字符串或列表或其他什么?
(我试图在Perl中编写一个程序,用Debian包信息提取文本文件的信息,并将其转换为RDF(OWL)文件(本体)。)
描述用于解码ATSC A / 52流的库(开发) liba52是一个用于解码ATSC A / 52流的免费库。 A / 52标准是 用于各种应用,包括数字电视和DVD。它是 也被称为AC-3。
此包包含开发文件。 主页:http://liba52.sourceforge.net/
标记: devel :: library,role :: devel-lib
到目前为止我写的代码是:
#!/usr/bin/perl
open(DEB,"Packages");
open(ONT,">>debianmodelling.txt");
$i=0;
while(my $line = <DEB>)
{
if($line =~ /Package/)
{
$line =~ s/Package: //;
print ONT ' <package rdf:ID="instance'.$i.'">';
print ONT ' <name rdf:datatype="http://www.w3.org/2001/XMLSchema#string">'.$line.'</name>'."\n";
}
elsif($line =~ /Priority/)
{
$line =~ s/Priority: //;
print ONT ' <priority rdf:datatype="http://www.w3.org/2001/XMLSchema#string">'.$line.'</priority>'."\n";
}
elsif($line =~ /Section/)
{
$line =~ s/Section: //;
print ONT ' <Section rdf:datatype="http://www.w3.org/2001/XMLSchema#string">'.$line.'</Section>'."\n";
}
elsif($line =~ /Maintainer/)
{
$line =~ s/Maintainer: //;
print ONT ' <maintainer rdf:datatype="http://www.w3.org/2001/XMLSchema#string">'.$line.'</maintainer>'."\n";
}
elsif($line =~ /Architecture/)
{
$line =~ s/Architecture: //;
print ONT ' <architecture rdf:datatype="http://www.w3.org/2001/XMLSchema#string">'.$line.'</architecture>'."\n";
}
elsif($line =~ /Version/)
{
$line =~ s/Version: //;
print ONT ' <version rdf:datatype="http://www.w3.org/2001/XMLSchema#string">'.$line.'</version>'."\n";
}
elsif($line =~ /Provides/)
{
$line =~ s/Provides: //;
print ONT ' <provides rdf:datatype="http://www.w3.org/2001/XMLSchema#string">'.$line.'</provides>'."\n";
}
elsif($line =~ /Depends/)
{
$line =~ s/Depends: //;
print ONT ' <depends rdf:datatype="http://www.w3.org/2001/XMLSchema#string">'.$line.'</depends>'."\n";
}
elsif($line =~ /Suggests/)
{
$line =~ s/Suggests: //;
print ONT ' <suggests rdf:datatype="http://www.w3.org/2001/XMLSchema#string">'.$line.'</suggests>'."\n";
}
elsif($line =~ /Description/)
{
$line =~ s/Description: //;
print ONT ' <Description rdf:datatype="http://www.w3.org/2001/XMLSchema#string">'.$line.'</Description>'."\n";
}
elsif($line =~ /Tag/)
{
$line =~ s/Tag: //;
print ONT ' <Tag rdf:datatype="http://www.w3.org/2001/XMLSchema#string">'.$line.'</Tag>'."\n";
print ONT ' </Package>'."\n\n";
}
$i=$i+1;
}
答案 0 :(得分:17)
my $desc = "Description:";
my $tag = "Tag:";
$line =~ /$desc(.*?)$tag/;
my $matched = $1;
print $matched;
或
my $desc = "Description:";
my $tag = "Tag:";
my @matched = $line =~ /$desc(.*?)$tag/;
print $matched[0];
或
my $desc = "Description:";
my $tag = "Tag:";
(my $matched = $line) =~ s/$desc(.*?)$tag/$1/;
print $matched;
如果您的描述和标记可能位于不同的行上,则可能需要使用/s
修饰符将其视为单行,因此\n
不会破坏它。例如:
$_=qq{Description:foo
more description on
new line Tag: some
tag};
s/Description:(.*?)Tag:/$1/s; #notice the trailing slash
print;
答案 1 :(得分:4)
假设:
my $example; # holds the example text above
你可以:
(my $result=$example)=~s/^.*?\n(Description:)/$1/s; # strip up to first marker
$result=~s/(\nTag:[^\n]*\n).+$/$1/s; # strip everything after second marker line
或者
(my $result=$example)=~s/^.*?\n(Description:.+?Tag:[^\n]*\n).*$/$1/s;
两者都假设Tag:值包含在一行中。
如果不是这种情况,您可以尝试:
(my $result=$example)=~s/
( # start capture
Description: # literal 'Description:'
.+? # any chars (non-greedy) up to
Tag: # literal 'Tag:'
.+? # any chars up to
)
(?: # either
\n[A-Z][a-z]+\: # another tagged value name
| # or
$ # end of string
)
/$1/sx;
答案 2 :(得分:3)
我认为问题是由段落构成的数据使用行读取循环引起的。如果您可以将文件粘贴到内存中并使用捕获的分隔符应用拆分,则处理将更加顺畅:
#!/usr/bin/perl -w
use strict;
use diagnostics;
use warnings;
use English;
# simple sample sub
my $printhead = sub {
printf "%5s got the tag '%s ...'\n", '', substr( shift, 0, 30 );
};
# map keys/tags? to functions
my %tagsoups = (
'PackageName' => sub {printf "%5s got the name '%s'\n", '', shift;}
, 'Description' => sub {printf "%5s got the description:\n---------\n%s\n----------\n", '', shift;}
, 'Tag' => $printhead
);
# slurp Packages (fallback: parse using $INPUT_RECORD_SEPARATOR = "Package:")
open my $fh, "<", './Packages-00.txt' or die $!;
local $/; # enable localized slurp mode
my $all = <$fh>;
my @pks = split /^(Package):\s+/ms, $all;
close $fh;
# outer loop: Packages
for (my $p = 1, my $n = 0; $p < scalar @pks; $p +=2) {
my $blk = "PackageName: " . $pks[$p + 1];
my @inf = split /\s*^([\w-]+):\s+/ms, $blk;
printf "%3d %s named %s\n", ++$n, $pks[$p], $inf[ 2 ];
# outer loop: key-value-pairs (or whatever they are called)
for (my $x = 1; $x < scalar @inf; $x += 2) {
if (exists($tagsoups{$inf[ $x ]})) {
$tagsoups{$inf[ $x ]}($inf[$x + 1]);
}
}
}
从我的Ubuntu Linux输出缩短的Packages文件:
3 Package named abrowser-3.5-branding
got the PackageName:
---------
abrowser-3.5-branding
----------
got the Description:
---------
dummy upgrade package for firefox-3.5 -> firefox
This is a transitional package so firefox-3.5 users get firefox on
upgrades. It can be safely removed.
----------
4 Package named casper
got the PackageName:
---------
casper
----------
got the Description:
---------
Run a "live" preinstalled system from read-only media
----------
got the Tag:
---------
admin::boot, admin::filesystem, implemented-in::shell, protocol::smb, role::plugin, scope::utility, special::c
ompletely-tagged, works-with-format::iso9660
----------
使用散列函数应用于提取的部分将保留生成xml的细节,使其脱离解析器循环。