Question

我创建了一个数组来获取文件，然后解析该文件的内容。我已使用if(strlen($value) < 4): unset($content[$key]); endif;

过滤掉少于4个字符的字词

我的问题是这个 - 我想删除数组中的常用单词，但其中有很多单词。我不知道在每个数组值上反复进行这些检查，我想知道是否有更有效的方法来做到这一点？

以下是我目前正在使用的代码示例。这个列表可能很大，我认为必须有一个更好（更有效）的方式？

foreach ($content as $key=>$value) {
    if(strlen($value) < 4): unset($content[$key]); endif; 
    if($value == 'that'): unset($content[$key]); endif;
    if($value == 'have'): unset($content[$key]); endif;
    if($value == 'with'): unset($content[$key]); endif;
    if($value == 'this'): unset($content[$key]); endif;
    if($value == 'your'): unset($content[$key]); endif;
    if($value == 'will'): unset($content[$key]); endif;
    if($value == 'they'): unset($content[$key]); endif;
    if($value == 'from'): unset($content[$key]); endif;
    if($value == 'when'): unset($content[$key]); endif;
    if($value == 'then'): unset($content[$key]); endif;
    if($value == 'than'): unset($content[$key]); endif;
    if($value == 'into'): unset($content[$key]); endif;
}

Answer 1

也许这会更好：

$filter = array("that","have","with",...);

foreach ($content as $key=>$value) {
   if (in_array($value,$filter)){
      unset($content[$key])
   }
}

Answer 2

我是这样做的：

$exlcuded_words = array( 'that','have','with','this','your','will','they','from','when','then','than','into');
$replace = array_fill_keys($exlcuded_words,'');
echo str_replace(array_keys($replace),$replace,'some words that have to be with this your will they have from when then that into replaced');

它的工作方式：创建一个充满空字符串的数组，其中键是要删除/替换的子字符串。只使用str_replace，将键作为第一个参数传递，数组本身作为第二个参数，在这种情况下的结果是：some words to be replaced。此代码已经过测试，可以正常使用。

当处理一个数组时，只需要用一些古怪的分隔符（比如%@%@%或其他东西）和str_replace来破坏它，再次爆炸，然后鲍勃是你的叔叔

当要更换少于3个字符的所有单词时（我在原来的答案中忘记了），这是正则表达式擅长的东西......我会说像preg_replace('(\b|[^a-z])[a-z]{1,3}(\b|[^a-z])/i','$1$2',implode(',',$targetArray));或类似的东西那。你可能想要测试一下，因为这只是我的头脑，并且没有经过测试。但这似乎足以让你开始

Answer 3

我可能会这样做：

$aCommonWords = array('that','have','with','this','yours','etc.....');

foreach($content as $key => $value){
    if(in_array($value,$aCommonWords)){
        unset($content[$key]);
    }
}

Answer 4

创建要删除的单词数组，并检查该值是否在该数组中

$exlcuded_words = array( 'that','have','with','this','your','will','they','from','when','then','than','into');

以及foreach

if (in_array($value, $excluded_words)) unset($content[$key];

Answer 5

另一种可能的解决方案：

$arr = array_flip(array( 'that', 'have', 'with', 'this', 'your', 'will', 
        'they', 'from', 'when', 'then', 'than', 'into' ));
foreach ($content as $key=>$value) {
    if(strlen($value) < 4 || isset($arr[$value])) {
        unset($content[$key]);
    }
}

Answer 6

使用array_diff()：

$content = array('here','are','some','words','that','will','be','filtered');
$filter = array('that','have','here','are','will','they','from','when','then');
$result = array_diff($content, $filter);

结果：

Array
(
    [2] => some
    [3] => words
    [6] => be
    [7] => filtered
)

或者，如果要在过滤内容方面具有更大的灵活性（例如，您提到需要过滤出少于4个字符的单词），则可以使用array_filter()：

$result = array_filter($content, function($v) use ($filter) {
    return !in_array($v, $filter) && strlen($v) >= 4;
});

结果：

Array
(
    [2] => some
    [3] => words
    [7] => filtered
)

Answer 7

system.time(t(m))
 #   user  system elapsed 
 # 23.990  23.416  85.722 
system.time(t(dt))
 #   user  system elapsed 
 # 31.223  53.197 195.221 
system.time(t(df))
 #   user  system elapsed 
 # 30.609  45.404 148.323 
system.time(setDT(transpose(dt)))
 #   user  system elapsed 
 # 42.135  38.478 116.599

结果：

$var = array('abb', 'bffb', 'cbbb', 'dddd', 'dddd', 'f', 'g');
$var= array_unique($var);
foreach($var as $val){
    echo $val. " ";
}

最简单的方法

从PHP数组中删除选择单词的有效方法

7 个答案: