例如我有这个:
$string = 'PHP is a server side web programming language , Do you like PHP ? , PHP is fantastic';
$array = array('html','css','javascript','ajax','html5','css3','jquery','PHP');
foreach($array as $ar){
//Check if one of the $array values exists before the question mark '?' in the $string
}
我想在问号“?”之前搜索仅在$ string中,所以如果$ array值“PHP”在问号“?”之前不直接然后什么都不会发生,因为它不存在,PHP可能是$数组中的任何其他值,所以我不知道应该找到的值的长度,我的意思是这个词可以重复并且具有不同的长度。 / p>
即:$string = 'html .... , html is fantastic , Do you like html? , I love html';
,现在这个词的长度更大,而且可能更大。
如何在问号之前找到唯一直接的“PHP”并在“喜欢”['你喜欢PHP吗?']后找到这个单词的长度是什么?
答案 0 :(得分:0)
您可以使用正则表达式执行所需操作,但如果您对文本进行标记,则可以获得更大的灵活性:
<?php
$string = 'PHP is a server side web programming language , Do you like PHP?, Do you like Javascript ? What is Ajax?? Coding is fun.';
$find = ['html','css','javascript','ajax','html5','css3','jquery','php'];
// Convert to lowercase and add whitespace to punctuation
$tokenized_string = preg_replace("/([^a-zA-Z0-9'-_ ])/", ' \1 ', strtolower($string));
// Condense multiple sequential spaces into a single space
$tokenized_string = preg_replace('/ {2,}/', ' ', $tokenized_string);
// Tokenize the text into words
$words = explode(' ', $tokenized_string);
// Find search terms directly preceding a question mark token
$question_words = array_filter(
array_intersect($words, $find),
function($k) use ($words) {
return @$words[$k+1] == '?';
},
ARRAY_FILTER_USE_KEY
);
// Output our matches
var_dump($question_words);
这会创建一个标准化的标记数组$words
,如:
array(30) {
[0] =>
string(3) "php"
[1] =>
string(2) "is"
[2] =>
string(1) "a"
[3] =>
string(6) "server"
[4] =>
string(4) "side"
[5] =>
string(3) "web"
[6] =>
string(11) "programming"
[7] =>
string(8) "language"
[8] =>
string(1) ","
[9] =>
string(2) "do"
[10] =>
string(3) "you"
[11] =>
string(4) "like"
[12] =>
string(3) "php"
[13] =>
string(1) "?"
[14] =>
string(1) ","
[15] =>
string(2) "do"
[16] =>
string(3) "you"
[17] =>
string(4) "like"
[18] =>
string(10) "javascript"
[19] =>
string(1) "?"
[20] =>
string(4) "what"
[21] =>
string(2) "is"
[22] =>
string(4) "ajax"
[23] =>
string(1) "?"
[24] =>
string(1) "?"
[25] =>
string(6) "coding"
[26] =>
string(2) "is"
[27] =>
string(3) "fun"
[28] =>
string(1) "."
[29] =>
string(0) ""
}
它会返回在问号前找到的搜索字词数组,并按其在$words
数组中的位置键入:
array(3) {
[12] =>
string(3) "php"
[18] =>
string(10) "javascript"
[22] =>
string(4) "ajax"
}
这假设您没有使用node.js
之类的搜索字词,其中包含标点符号,尽管您可以使用此方法轻松地适应这种情况。
它还假设您没有任何多字搜索字词,例如amazon s3
。您可以使用array_intersect()
迭代问号代币而不是array_keys($words, '?')
,并根据字长检查您前面的代币中的搜索字词。