Question

我能够根据您的建议成功提取所有内容。我的问题按预期发布，正则表达式没有正确识别某些东西......非常感谢!!这是我的最终代码......希望它可以帮到某人！

        if($_=~/(Research Interests)/){
            $research = "Research Interest";

            if($_=~m/<h2>Research Interests<\/h2>(.*?)<p>(.*?)<\/p>/gs){
                    @researchInterests = split(/,+/, $2);
                    $count = 1;
                    foreach(@researchInterests){
                            print "$research $count:";
                            print $_. "\n";
                            $count++;
                    }
            }
    }

Answer 1

问题是你一次只读一行。你为什么不读完整个文件并与之匹配。

my $file;
{
    local $/;
    $file = <FILE>;
}

Answer 2

你可以在这一点上获得更多的线路：

while (<FILE>) {
  if (m/Research Interests/) {
    while (<FILE>) {
      if (m/<p>(.*)<p>/) {
        print "Research Interests: $1";
        last;
      }
    }
  }
}

我不知道你的文件是否庞大，但是值得学习不需要一次读取整个文件的技术，这样你就可以处理任意大的文件，或者使用流。

Answer 3

如果您必须这样做，可以尝试将换行符分隔符设置为undef：

#!/usr/bin/perl
use warnings;
use strict;

my $infile = 'in.txt';
open my $input, '<', $infile or die "Can't open to $infile: $!";

my $reserch_interests;
$/=undef;
while(<$input>){
        if($_ =~ /(Research Interests)/){
            $reserch_interests = $1;
                if($_=~ m/<p>(.*)<\/p>/){
                        print "Title: $reserch_interests\nInterests: $1\n";
                }
        }

}

打印：

Title: Research Interests
Interests: Data mining, databases, information retrieval

正则表达式跨多行

3 个答案: