如何从不匹配给定元素数组的字符串中获取元素

时间:2017-12-20 22:10:18

标签: php regex string-matching regex-negation

说我有这个元素的黑名单:

$bl = array(
    'sit',
    'consectetur',
    'adipiscing',
    ',',
    '.',
    ' ',
);

和这个字符串:

$subject = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.'

除了来自黑名单的那些元素之外,获取该字符串中每个元素的数组有什么好方法?

3 个答案:

答案 0 :(得分:1)

很难用给定的信息给出明确的答案。就像你如何定义一个元素?在阻止列表中是否有间隔的昏迷时段,只是为了省略除了一个单词之外的所有内容,或者它们是否是元素而你恰好不想要那些特定的那些?其他标点怎么样?这是一个可能的解决方案,但它取决于一些未知的因素,它是否适合你:

$bl = [
    'sit',
    'consectetur',
    'adipiscing',
];
$subject = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.';

// REMOVE PUNCTUATION. Add punctuation you want to omit in the brackets
$subject = preg_replace( "#[,.]#", '', $subject );

// GET DIFFERENCE BETWEEN TWO ARRAYS
$array = array_diff(
        preg_split( "#\s+#", $subject ),
        $bl
);

print_r( $array );

答案 1 :(得分:1)

通过preg_split()的一次通话完成所有操作。该模式首先需要一些准备,它将有效地处理黑名单数组中的每个条目,并将其视为分隔符。调用的第4个参数确保从输出数组中丢弃空元素。

Pattern Demo / Official Breakdown

代码:(Demo

$blacklist=['sit','consectetur','adipiscing',',','.',' '];
$subject = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.';

foreach($blacklist as &$entry){  // & means modify by reference, this means $blacklist (not its copy) will be altered
    if(ctype_alpha($entry)){
        $entry="\b{$entry}\b"; // wrap in word boundaries (str_replace doesn't offer this accuracy)
    }else{
        $entry="\Q{$entry}\E"; // make non-word literal, no word boundaries required
    }
}
$pattern='/'.implode('|',$blacklist).'/i';
// generates: '/\bsit\b|\bconsectetur\b|\badipiscing\b|\Q,\E|\Q.\E|\Q \E/i'
var_export(preg_split($pattern,$subject,NULL,PREG_SPLIT_NO_EMPTY));  // this treats each blacklisted words as a delimiter

输出:

array (
  0 => 'Lorem',
  1 => 'ipsum',
  2 => 'dolor',
  3 => 'amet',
  4 => 'elit',
)

作为一种不灵活的替代方法,不使用正则表达式,您可以利用str_word_count()的魔力 - 由于str_word_count()的方式,这为您的样本输入数据提供了所需的输出对待标点符号。 array_diff()只删除两个数组中存在的所有元素:

代码:(Demo

$bl = array('sit', 'consectetur','adipiscing', ',', '.', ' ');  // the , . and space elements could be removed here
$subject = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.';

var_export(array_diff(str_word_count($subject,1),$bl));  // if you use strtolower() on $subject, you will get case-insensitive comparisons, but then you will have modified output values. :(  This is just one reason why this not a robust method.

输出:

array (
  0 => 'Lorem',
  1 => 'ipsum',
  2 => 'dolor',
  4 => 'amet',
  7 => 'elit',
)

答案 2 :(得分:0)

很多方法都可以!我这样说道:

<?php
 $bl = array('sit', 'consectetur','adipiscing', ',', '.', ' ');
 $subject = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.';
 $a_subject = explode(' ', $subject);
 $a_subject_filtered = array_filter($a_subject, function($item) use ($bl){
     return !in_array($item, $bl);
 });