Question

例如我有这个：

$string = 'PHP is a server side web programming language , Do you like PHP ?  , PHP is fantastic';

$array = array('html','css','javascript','ajax','html5','css3','jquery','PHP');

foreach($array as $ar){
   //Check if one of the $array values exists before the question mark '?' in the $string
}

我想在问号“？”之前搜索仅在$ string中，所以如果$ array值“PHP”在问号“？”之前不直接然后什么都不会发生，因为它不存在，PHP可能是$数组中的任何其他值，所以我不知道应该找到的值的长度，我的意思是这个词可以重复并且具有不同的长度。 / p>

即：$string = 'html .... , html is fantastic , Do you like html? , I love html';，现在这个词的长度更大，而且可能更大。

如何在问号之前找到唯一直接的“PHP”并在“喜欢”['你喜欢PHP吗？']后找到这个单词的长度是什么？

Answer 1

您可以使用正则表达式执行所需操作，但如果您对文本进行标记，则可以获得更大的灵活性：

<?php
$string = 'PHP is a server side web programming language , Do you like PHP?, Do you like Javascript ? What is Ajax?? Coding is fun.';
$find = ['html','css','javascript','ajax','html5','css3','jquery','php'];

// Convert to lowercase and add whitespace to punctuation
$tokenized_string = preg_replace("/([^a-zA-Z0-9'-_ ])/", ' \1 ', strtolower($string));

// Condense multiple sequential spaces into a single space
$tokenized_string = preg_replace('/ {2,}/', ' ', $tokenized_string);

// Tokenize the text into words
$words = explode(' ', $tokenized_string);

// Find search terms directly preceding a question mark token
$question_words = array_filter(
    array_intersect($words, $find),
    function($k) use ($words) {
        return @$words[$k+1] == '?';
    },
    ARRAY_FILTER_USE_KEY
);

// Output our matches
var_dump($question_words);

这会创建一个标准化的标记数组$words，如：

array(30) {
  [0] =>
  string(3) "php"
  [1] =>
  string(2) "is"
  [2] =>
  string(1) "a"
  [3] =>
  string(6) "server"
  [4] =>
  string(4) "side"
  [5] =>
  string(3) "web"
  [6] =>
  string(11) "programming"
  [7] =>
  string(8) "language"
  [8] =>
  string(1) ","
  [9] =>
  string(2) "do"
  [10] =>
  string(3) "you"
  [11] =>
  string(4) "like"
  [12] =>
  string(3) "php"
  [13] =>
  string(1) "?"
  [14] =>
  string(1) ","
  [15] =>
  string(2) "do"
  [16] =>
  string(3) "you"
  [17] =>
  string(4) "like"
  [18] =>
  string(10) "javascript"
  [19] =>
  string(1) "?"
  [20] =>
  string(4) "what"
  [21] =>
  string(2) "is"
  [22] =>
  string(4) "ajax"
  [23] =>
  string(1) "?"
  [24] =>
  string(1) "?"
  [25] =>
  string(6) "coding"
  [26] =>
  string(2) "is"
  [27] =>
  string(3) "fun"
  [28] =>
  string(1) "."
  [29] =>
  string(0) ""
}

它会返回在问号前找到的搜索字词数组，并按其在$words数组中的位置键入：

array(3) {
  [12] =>
  string(3) "php"
  [18] =>
  string(10) "javascript"
  [22] =>
  string(4) "ajax"
}

这假设您没有使用node.js之类的搜索字词，其中包含标点符号，尽管您可以使用此方法轻松地适应这种情况。

它还假设您没有任何多字搜索字词，例如amazon s3。您可以使用array_intersect()迭代问号代币而不是array_keys($words, '?')，并根据字长检查您前面的代币中的搜索字词。

如何从问号开始搜索字符串“？”落后？

1 个答案: