问题陈述 -
我们需要从字符串中提取一组连续出现的单词。
最简单的示例如下所示,具有预期的输入和输出。
set of words => "word1|word2|word3";
Input string => "i m word1 word2 and this is word3 word2 word1+ i am having this word2 word3.";"
Output => word1 word2
word3 word2 word1
word2 word3
注意 - 请注意," word1 +"中没有空格。和" word3。"
请将此视为最简单的输入。复杂性可以是任何范围。意味着可以有多组单词(比如500个单词),我们需要找到一组从输入字符串中出现的单词。
我在javascript中执行此操作因此我尝试的内容如下所示。
var pattern = "word1|word2|word3";
var regobj = new RegExp('((('+pattern+')\\s?)+)', "g");
我的解决方案有什么问题?
For Input string => "i m word1word2 and this is word3word2 word1+ i am having this word2 word3.";"
it will give output as
word1word2 -- wrong
word3word2 word1 -- wrong
word2 word3
为什么我要这个?或实时用例..!
我想从复杂的expression.say
中提取单词数字"one thousand two+three hundred four+1.3456+log(twenty)"
所以我需要提取
one thousand two
three hundred four
twenty
并且需要替换它们各自的数值等价物。
答案 0 :(得分:3)
使用字边界:
\b(?:word1|word2|word3)\b
perl中的完整正则表达式:
my $str = 'i m word1word2 and this is word3 word2 word1+ i am having this word2 word3.';
my @l = ($str =~ /((?:\b(?:word1|word2|word3)\b(?:\s|\.))+)/g);
dump@l;
<强>输出:强>
("word3 word2 ", "word2 word3.")
使用最后一个表达式:
my $str = 'one thousand two+three hundred four+1.3456+log(twenty)';
my @l = ($str =~ /((?:\b(?:one|two|three|four|twenty|hundred|thousand)\b\s*)+)/g);
dump@l;
<强>输出:强>
("one thousand two", "three hundred four", "twenty")
答案 1 :(得分:0)
对于问题的第二部分,您可以使用Lingua::EN::Words2Nums
#!/usr/bin/perl
use strict;
use warnings;
use Lingua::EN::Words2Nums;
my $string = "one thousand two+three hundred four+1.3456+log(twenty)";
my $re = qr(one|thousand|two|three|hundred|four|twenty);
my @groups = split(m/\+/,$string);
for my $group (@groups) {
my @words = ($group =~ m/\b$re\b/g);
next unless @words;
my $number = words2nums("@words");
print "@words => $number\n";
}
输出:
one thousand two => 1002
three hundred four => 304
twenty => 20
答案 2 :(得分:0)
在Perl中,您可以使用拆分和 grep :
perl -e '$w="word1|word2|word3"; while(<>){ print join " ", grep { /$w/ } split /\W/, $_ }'
i m word1 word2 and this is word3 word2 word1+ i am having this word2 word3.
word1 word2 word3 word2 word1 word2 word3
在JavaScript中使用相同的功能:
var input="i m word1 word2 and this is word3 word2 word1+ i am having this word2 word3.";
var r=new RegExp("^(word1|word2|word3)$");
var wr=new RegExp("\\W");
var out = new Array();
var split = input.split(wr);
for( var i=0; i < split.length; i++) {
if( split[i].match( r ) ){
out.push(split[i]);
}
}
console.log(out);
输出:
["word1", "word2", "word3", "word2", "word1", "word2", "word3"]