考虑以下字符串:
I have had the greatest {A} {B} day yesterday {C}
我想用bi-gram创建一个数组,忽略所有标签(标签在{bracket}之间)
[0] => I-have
[1] => have-had
[2] => had-the
[3] => the-greatest
[4] => greatest-day
[5] => day-yesterday
在PHP中,最好的方法是什么?使用正则表达式或爆炸“”然后迭代所有单词?我在这里开始遇到麻烦,所以任何帮助都会非常感激:)
答案 0 :(得分:2)
使用explode
可以轻松完成:
$string="I have had the greatest {A} {B} day yesterday {C}";
$words=explode(" ",$string);
$filtered_words=array();
foreach($words as $w)
{
if(!preg_match("/{.*}/",$w))
{
array_push($filtered_words,$w);
}
}
$output=array();
foreach(range(0,count($filtered_words)-2) as $i)
{
array_push($output,$filtered_words[$i] . "-" . $filtered_words[$i+1]);
}
var_dump($output);
输出结果为:
array(6) {
[0]=>
string(6) "I-have"
[1]=>
string(8) "have-had"
[2]=>
string(7) "had-the"
[3]=>
string(12) "the-greatest"
[4]=>
string(12) "greatest-day"
[5]=>
string(13) "day-yesterday"
}
答案 1 :(得分:1)
略有不同的方法:
$string = '{D} I have had the greatest {A} {B} day yesterday {C}';
// explode on spaces
$arr = explode(' ', $string);
$bigrams = array();
// remove all "labels" with regex (assuming it matches \w)
$arr = array_values(array_filter($arr, function($s){
return !preg_match("/\{\w\}/", $s);
}));
// get the bigrams
$len = count($arr);
for ($i = 0; $i <= $len - 2; $i++) {
$bigrams[] = $arr[$i] . '-' . $arr[$i+1];
}
print_r($bigrams);