如何从问号开始搜索字符串“?”落后?

时间:2017-10-25 17:00:15

标签: php arrays regex

例如我有这个:

$string = 'PHP is a server side web programming language , Do you like PHP ?  , PHP is fantastic';

$array = array('html','css','javascript','ajax','html5','css3','jquery','PHP');

foreach($array as $ar){
   //Check if one of the $array values exists before the question mark '?' in the $string
}

我想在问号“?”之前搜索仅在$ string中,所以如果$ array值“PHP”在问号“?”之前不直接然后什么都不会发生,因为它不存在,PHP可能是$数组中的任何其他值,所以我不知道应该找到的值的长度,我的意思是这个词可以重复并且具有不同的长度。 / p>

即:$string = 'html .... , html is fantastic , Do you like html? , I love html';,现在这个词的长度更大,而且可能更大。

如何在问号之前找到唯一直接的“PHP”并在“喜欢”['你喜欢PHP吗?']后找到这个单词的长度是什么?

1 个答案:

答案 0 :(得分:0)

您可以使用正则表达式执行所需操作,但如果您对文本进行标记,则可以获得更大的灵活性:

<?php
$string = 'PHP is a server side web programming language , Do you like PHP?, Do you like Javascript ? What is Ajax?? Coding is fun.';
$find = ['html','css','javascript','ajax','html5','css3','jquery','php'];

// Convert to lowercase and add whitespace to punctuation
$tokenized_string = preg_replace("/([^a-zA-Z0-9'-_ ])/", ' \1 ', strtolower($string));

// Condense multiple sequential spaces into a single space
$tokenized_string = preg_replace('/ {2,}/', ' ', $tokenized_string);

// Tokenize the text into words
$words = explode(' ', $tokenized_string);

// Find search terms directly preceding a question mark token
$question_words = array_filter(
    array_intersect($words, $find),
    function($k) use ($words) {
        return @$words[$k+1] == '?';
    },
    ARRAY_FILTER_USE_KEY
);

// Output our matches
var_dump($question_words);

这会创建一个标准化的标记数组$words,如:

array(30) {
  [0] =>
  string(3) "php"
  [1] =>
  string(2) "is"
  [2] =>
  string(1) "a"
  [3] =>
  string(6) "server"
  [4] =>
  string(4) "side"
  [5] =>
  string(3) "web"
  [6] =>
  string(11) "programming"
  [7] =>
  string(8) "language"
  [8] =>
  string(1) ","
  [9] =>
  string(2) "do"
  [10] =>
  string(3) "you"
  [11] =>
  string(4) "like"
  [12] =>
  string(3) "php"
  [13] =>
  string(1) "?"
  [14] =>
  string(1) ","
  [15] =>
  string(2) "do"
  [16] =>
  string(3) "you"
  [17] =>
  string(4) "like"
  [18] =>
  string(10) "javascript"
  [19] =>
  string(1) "?"
  [20] =>
  string(4) "what"
  [21] =>
  string(2) "is"
  [22] =>
  string(4) "ajax"
  [23] =>
  string(1) "?"
  [24] =>
  string(1) "?"
  [25] =>
  string(6) "coding"
  [26] =>
  string(2) "is"
  [27] =>
  string(3) "fun"
  [28] =>
  string(1) "."
  [29] =>
  string(0) ""
}

它会返回在问号前找到的搜索字词数组,并按其在$words数组中的位置键入:

array(3) {
  [12] =>
  string(3) "php"
  [18] =>
  string(10) "javascript"
  [22] =>
  string(4) "ajax"
}

这假设您没有使用node.js之类的搜索字词,其中包含标点符号,尽管您可以使用此方法轻松地适应这种情况。

它还假设您没有任何多字搜索字词,例如amazon s3。您可以使用array_intersect()迭代问号代币而不是array_keys($words, '?'),并根据字长检查您前面的代币中的搜索字词。