说我有这个元素的黑名单:
$bl = array(
'sit',
'consectetur',
'adipiscing',
',',
'.',
' ',
);
和这个字符串:
$subject = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.'
除了来自黑名单的那些元素之外,获取该字符串中每个元素的数组有什么好方法?
答案 0 :(得分:1)
很难用给定的信息给出明确的答案。就像你如何定义一个元素?在阻止列表中是否有间隔的昏迷时段,只是为了省略除了一个单词之外的所有内容,或者它们是否是元素而你恰好不想要那些特定的那些?其他标点怎么样?这是一个可能的解决方案,但它取决于一些未知的因素,它是否适合你:
$bl = [
'sit',
'consectetur',
'adipiscing',
];
$subject = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.';
// REMOVE PUNCTUATION. Add punctuation you want to omit in the brackets
$subject = preg_replace( "#[,.]#", '', $subject );
// GET DIFFERENCE BETWEEN TWO ARRAYS
$array = array_diff(
preg_split( "#\s+#", $subject ),
$bl
);
print_r( $array );
答案 1 :(得分:1)
通过preg_split()
的一次通话完成所有操作。该模式首先需要一些准备,它将有效地处理黑名单数组中的每个条目,并将其视为分隔符。调用的第4个参数确保从输出数组中丢弃空元素。
(Pattern Demo / Official Breakdown)
代码:(Demo)
$blacklist=['sit','consectetur','adipiscing',',','.',' '];
$subject = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.';
foreach($blacklist as &$entry){ // & means modify by reference, this means $blacklist (not its copy) will be altered
if(ctype_alpha($entry)){
$entry="\b{$entry}\b"; // wrap in word boundaries (str_replace doesn't offer this accuracy)
}else{
$entry="\Q{$entry}\E"; // make non-word literal, no word boundaries required
}
}
$pattern='/'.implode('|',$blacklist).'/i';
// generates: '/\bsit\b|\bconsectetur\b|\badipiscing\b|\Q,\E|\Q.\E|\Q \E/i'
var_export(preg_split($pattern,$subject,NULL,PREG_SPLIT_NO_EMPTY)); // this treats each blacklisted words as a delimiter
输出:
array (
0 => 'Lorem',
1 => 'ipsum',
2 => 'dolor',
3 => 'amet',
4 => 'elit',
)
作为一种不灵活的替代方法,不使用正则表达式,您可以利用str_word_count()
的魔力 - 由于str_word_count()
的方式,这为您的样本输入数据提供了所需的输出对待标点符号。 array_diff()
只删除两个数组中存在的所有元素:
代码:(Demo)
$bl = array('sit', 'consectetur','adipiscing', ',', '.', ' '); // the , . and space elements could be removed here
$subject = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.';
var_export(array_diff(str_word_count($subject,1),$bl)); // if you use strtolower() on $subject, you will get case-insensitive comparisons, but then you will have modified output values. :( This is just one reason why this not a robust method.
输出:
array (
0 => 'Lorem',
1 => 'ipsum',
2 => 'dolor',
4 => 'amet',
7 => 'elit',
)
答案 2 :(得分:0)
很多方法都可以!我这样说道:
<?php
$bl = array('sit', 'consectetur','adipiscing', ',', '.', ' ');
$subject = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.';
$a_subject = explode(' ', $subject);
$a_subject_filtered = array_filter($a_subject, function($item) use ($bl){
return !in_array($item, $bl);
});