我有一些标签,其值如下,
<section>
<title id="ABC0123">is The human nervous system?</title>
<para>A tag is a keyword or label that categorizes your question with other, similar questions</para>
<section>
<title id="DEF0123">Terms for anatomical directions in the nervous system</title>
<para>A tag is a keyword or label that categorizes your question with other, similar questions</para>
</section>
<section>
<title id="ABC4356">Anatomical terms: is referring to directions</title>
.
.
.
我需要的输出如下,
<section>
<title id="ABC0123">Is the Human Nervous System?</title>
<para>A tag is a keyword or label that categorizes your question with other, similar questions</para>
</section>
<section>
<title id="DEF0123">Terms for Anatomical Directions in the Nervous System</title>
<para>A tag is a keyword or label that categorizes your question with other, similar questions</para>
<section>
<title id="ABC4356">Anatomical Terms: Is Referring to Directions</title>
.
.
我怎么能用perl做到这一点。这里的所有介词和文章都是小写的。现在情况略有不同,如下所示
条件是如果@lowercase中的单词(假设是)并且它是小写的第一个单词并且是小写的则它应该是大写的。再次,如果冒号后面的任何@lowercase单词应该是大写的。
答案 0 :(得分:2)
可能就是这样:
#!/usr/bin/env perl
use strict;
use warnings;
my $lines = qq#
<title>The human nervous system</title>
<title>Terms for anatomical directions in the nervous system</title>
<title>Anatomical terms referring to directions</title>
#;
foreach my $line ( split(/\n/, $lines ) ) {
$line =~ s|</?title>||g;
if ( $line = /\w+/ ) { # Skip if blank
print "<title>" . ucfirst(
join(" ",
map{ !/^(in|the|on|or|to|for)$/i ? ucfirst($_) : lc($_); }
split(/\s/, $line )
)
) ."<\/title>\n";
}
}
或者你想循环你的文件。但是你必须过滤掉你不希望转换的术语。正如我所示。
答案 1 :(得分:0)
匹配更新问题的新答案(自原始问题以来样本输入和所需输出已更改)。根据操作要求,我们会在2014年3月9日再次更新,以便始终将标题标记中的第一个单词设为大写。
#!/usr/bin/perl
use strict;
use warnings;
# Add your articles and prepositions here!!!
my @lowercase = qw(a an at for in is the to);
# Use a hash since lookup is easier later.
my %lowercase;
# Populate the hash with keys and values from @lowercase.
# Values could have been anything, but it needs to match the number of keys, so this is easiest.
@lowercase{@lowercase} = @lowercase;
open(F, "foo.txt") or die $!;
while(<F>) {
if (m/^<title/i) {
chomp;
my @words;
my $line = $_;
# Save the opening <title> tags
my $titleTag = $line;
$titleTag =~ s/^(<[^>]*>).*/$1/;
# Remove any tags in <brackets>
$line =~ s/<[^>]*>//g;
# Uppercase the first letter in every word, except for those in a certain list.
my $first = 1;
foreach my $word (split(/\s/, $line)) {
if ($first) {
$first = 0;
push(@words, ucfirst($word));
next;
}
if ($first || exists $lowercase{$word}) { push(@words, "$word") }
else { push(@words, ucfirst($word)) }
}
print $titleTag . join(" ", @words) . "</title>\n";
}
else {
print $_;
}
}
close(F)
这段代码做了2个假设:
<title>...</title>
都在一行上。它永远不会包装更多
文件中的一行。<title>
标记位于该行的开头。如果需要,可以在代码中轻松更改。