使用Perl搜索和替换

时间:2014-02-26 05:19:09

标签: perl

我有一些标签,其值如下,

<section>
<title id="ABC0123">is The human nervous system?</title>
<para>A tag is a keyword or label that categorizes your question with other, similar questions</para>
<section>
<title id="DEF0123">Terms for anatomical directions in the nervous system</title>
<para>A tag is a keyword or label that categorizes your question with other, similar questions</para>
</section>
<section>
<title id="ABC4356">Anatomical terms: is referring to directions</title>
.
.
.

我需要的输出如下,

<section>
<title id="ABC0123">Is the Human Nervous System?</title>
<para>A tag is a keyword or label that categorizes your question with other, similar questions</para>
</section>
<section>
<title id="DEF0123">Terms for Anatomical Directions in the Nervous System</title>
<para>A tag is a keyword or label that categorizes your question with other, similar questions</para>
<section>
<title id="ABC4356">Anatomical Terms: Is Referring to Directions</title>
.
.

我怎么能用perl做到这一点。这里的所有介词和文章都是小写的。现在情况略有不同,如下所示

条件是如果@lowercase中的单词(假设是)并且它是小写的第一个单词并且是小写的则它应该是大写的。再次,如果冒号后面的任何@lowercase单词应该是大写的。

2 个答案:

答案 0 :(得分:2)

可能就是这样:

#!/usr/bin/env perl
use strict;
use warnings;

my $lines = qq#
<title>The human nervous system</title>
<title>Terms for anatomical directions in the nervous system</title>
<title>Anatomical terms referring to directions</title>
#;

foreach my $line ( split(/\n/, $lines ) ) {

    $line =~ s|</?title>||g;

    if ( $line = /\w+/ ) {                # Skip if blank
        print "<title>" . ucfirst(
           join(" ",
               map{ !/^(in|the|on|or|to|for)$/i ? ucfirst($_) : lc($_); }
               split(/\s/, $line )
           )
        ) ."<\/title>\n";

    }
}

或者你想循环你的文件。但是你必须过滤掉你不希望转换的术语。正如我所示。

答案 1 :(得分:0)

匹配更新问题的新答案(自原始问题以来样本输入和所需输出已更改)。根据操作要求,我们会在2014年3月9日再次更新,以便始终将标题标记中的第一个单词设为大写。

#!/usr/bin/perl

use strict;
use warnings;

# Add your articles and prepositions here!!!
my @lowercase = qw(a an at for in is the to);

# Use a hash since lookup is easier later.
my %lowercase;
# Populate the hash with keys and values from @lowercase.
# Values could have been anything, but it needs to match the number of keys, so this is easiest.
@lowercase{@lowercase} = @lowercase;

open(F, "foo.txt") or die $!;
while(<F>) {
  if (m/^<title/i) {
    chomp;
    my @words;
    my $line = $_;
    # Save the opening <title> tags
    my $titleTag = $line;
    $titleTag =~ s/^(<[^>]*>).*/$1/;
    # Remove any tags in <brackets>
    $line =~ s/<[^>]*>//g;
    # Uppercase the first letter in every word, except for those in a certain list.
    my $first = 1;
    foreach my $word (split(/\s/, $line)) {
      if ($first) {
        $first = 0;
        push(@words, ucfirst($word));
        next;
      }
      if ($first || exists $lowercase{$word}) { push(@words, "$word") }
      else { push(@words, ucfirst($word)) }
    }
    print $titleTag . join(" ", @words) . "</title>\n";
  }
  else {
    print $_;
  }
}
close(F)

这段代码做了2个假设:

  1. 每个<title>...</title>都在一行上。它永远不会包装更多 文件中的一行。
  2. 开头<title>标记位于该行的开头。如果需要,可以在代码中轻松更改。