获得包含特定单词的所有句子

时间:2014-04-09 11:38:02

标签: php regex

我试图从包含句子集的文本中获取所有句子:

这是我的代码和

http://ideone.com/fork/O9XtOY

<?php
$var = array('one','of','here','Another');
$str = 'Start of sentence one. This is a wordmatch one two three four! Another, sentence here.';
foreach ($var as $val)
{
    $m =$val; // word 
    $regex = '/[A-Z][^\.\!\;]*('.$m.')[^\.;!]*/';
    //
    if (preg_match($regex, $str, $match))
    {
        echo $match[0];     
        echo "\n";
    }
}
  1. 为什么它不会打印最后一句话虽然我在这里,而另一个都出现在其中
  2. 如果已经存在,我如何跳过列表中的句子?想要删除冗余。我想在一些数据结构/变量中存储句子以便稍后使用所有这些句子

2 个答案:

答案 0 :(得分:1)

这将解决您的问题

<?php
 $var = array('one','of','here','Another');
 $str = 'Start of sentence one. This is a wordmatch one two three four! Another,    sentence here.';
 foreach ($var as $val)
 {


   if (stripos($str,$val) !== false) 
   {
      echo $val;     
      echo "\n";
   }
 }

答案 1 :(得分:1)

我说你的方法有点过于复杂。它更容易:

  1. 首先获得所有句子,
  2. 然后按您的条件过滤此设置。
  3. E.g:

    // keywords to search for
    $needles = array('one', 'of', 'here', 'Another');
    
    // input text
    $text = 'Start of sentence one. This is a wordmatch one two three four! Another, sentence here.';
    
    // get all sentences (the pattern could be too simple though)
    if (preg_match_all('/.+?[!.]\s*/', $text, $match)) {
    
      // select only those fitting the criteria
      $hits = array_filter($match[0], function ($sentence) use($needles) {
    
        // check each keyword
        foreach ($needles as $needle) {
          // return early on first hit (or-condition)
          if (false !== strpos($sentence, $needle)) {
            return true;
          }
        }
    
        return false;
      });
    
      // log output
      print_r($hits);
    }
    

    演示:http://ideone.com/pZfOb5


    注意关于:

    if (preg_match_all('/.+?[!.]\s*/', $text, $match)) {
    

    关于模式:

    .+?   // select at least one char, ungreedy
    [!.]  // until one of the given sentence
          // delimiters is found (could/should be extended as needed)
    \s*   // add all following whitespace
    

    array_filter($match[0], function ($sentence) use($needles) {
    

    array_filter只是做了它的名字。它返回输入数组的过滤版本(此处为$match[0])。为数组的每个元素调用提供的回调(内联函数),并且应该返回true / false,以确定当前元素是否应该是新数组的一部分。 use-syntax允许访问函数内部所需的$needles - 数组。