我有以下类型的字符串(引号表示它们都在一行上):
" AMINO-2,4,6-TRIIODOBENZOIC酸Hugo Holtermann,Baerum,Leif Gunnar Haugen,奥斯陆和Knut Wille,Baerum,挪威,Nye-5"
的转让人"生产乙烯化合物的工艺Duncan Clark和Percy Hayden,Norton-on-Tees,Eng- 5土地,转让给ImperiaI Chemical Industries Limited,英国伦敦"
我希望获得标题之后的所有内容(全部大写的部分)。所以我想得到:
" Hugo Holtermann,Baerum,Leif Gunnar Haugen,奥斯陆和Knut Wille,Baerum,挪威,Nye-5"
的转让人" Duncan Clark和Percy Hayden,Norton-on-Tees,Eng-5土地,转让给ImperiaI Chemical Industries Limited,英国伦敦"
我有比这两个更多的字符串,但基本格式是本发明的标题总是大写的字母和数字。
有没有办法在perl中使用正则表达式?
答案 0 :(得分:1)
如果它不需要100%准确,我只会寻找第一个大写字母,然后是第一个小写字母,然后抓住剩下的字母。
这样的事情(我的perl有点生疏,原谅任何语法错误):
$part_of_line = $full_line =~/([A-Z][a-z].*)/
答案 1 :(得分:0)
试试这个:
$text = "PROCESS FOR THE PRODUCTION OF ETHYLENIC COMPOUNDS Duncan Clark and Percy Hayden, Norton-on-Tees, Eng- 5 land, assignors to ImperiaI Chemical Industries Limited, London, England ";
if($text =~ m/(\b[A-Z0-9-, ]+)\b(.*)/) {
print "$2";
}
答案 2 :(得分:0)
我尝试了这个,得到了你期待的输出
if($ip =~ m/([A-Z0-9,\- ]+)([A-Z]+[a-z]+.*)/)
{
print "$2";
}
答案 3 :(得分:0)
标题总是以大写字母+空格结尾,所以这应该有效:
/^.+[A-Z]+ (.+)$/;
print $1;
答案 4 :(得分:0)
怎么样:
#!/usr/bin/perl
use strict;
use warnings;
use 5.014;
my $re = qr
/^ # Start of string
[\p{Lu}\pN, -]+ # one or more uppercase letter or number or comma or space or dash
( # start group 1
\p{Lu}[\pL.'] # one uppercase letter followed by any letter or dot or apostroph
) # end group
/x;
while(<DATA>) {
chomp;
s/$re/$1/g; # replace match by group 1
say;
}
__DATA__
AMINO-2,4,6-TRIIODOBENZOIC ACIDS Hugo Holtermann, Baerum, Leif Gunnar Haugen, Oslo, and Knut Wille, Baerum, Norway, assignors to Nye- 5
PROCESS FOR THE PRODUCTION OF ETHYLENIC COMPOUNDS Duncan Clark and Percy Hayden, Norton-on-Tees, Eng- 5 land, assignors to ImperiaI Chemical Industries Limited, London, England
PROCESS FOR THE PRODUCTION OF ETHYLENIC COMPOUNDS D.Clark
PROCESS FOR THE PRODUCTION OF ETHYLENIC COMPOUNDS O'Connors
<强>输出:强>
Hugo Holtermann, Baerum, Leif Gunnar Haugen, Oslo, and Knut Wille, Baerum, Norway, assignors to Nye- 5
Duncan Clark and Percy Hayden, Norton-on-Tees, Eng- 5 land, assignors to ImperiaI Chemical Industries Limited, London, England
D.Clark
O'Connors