Question

伙计们我有一个文本文件，我想删除一些包含特定单词的行

 <?php
// set source file name and path
$source = "problem.txt";

// read raw text as array
$raw = file($source) or die("Cannot read file");

现在有一个数组，我想删除一些行并想要使用它们。

Answer 1

由于您的文件的每一行都在一个数组的行中，array_filter函数可能会让您感兴趣（引用）：

array array_filter  ( array $input  [, callback $callback  ] )

迭代输入中的每个值   数组将它们传递给回调   功能。
如果回调   函数返回true，当前   输入的值返回到   结果数组。数组键是   保留。

您可以使用strpos或stripos来确定字符串是否包含在另一个字符串中。

例如，假设我们有这个数组：

$arr = array(
  'this is a test',
  'glop test',
  'i like php',
  'a badword, glop is', 
);

我们可以定义一个回调函数来筛选出包含“glop”的行：

function keep_no_glop($line) {
  if (strpos($line, 'glop') !== false) {
    return false;
  }
  return true;
}

并将该功能与array_filter：

一起使用

$arr_filtered = array_filter($arr, 'keep_no_glop');
var_dump($arr_filtered);

我们会得到这种输出：

array
  0 => string 'this is a test' (length=14)
  2 => string 'i like php' (length=10)

即。我们删除了包含“badword”“glop”的所有行。

当然，既然你有了基本的想法，没有什么能阻止你使用更复杂的回调函数; - ）

评论后修改：这里应该有一部分代码可以使用：

首先，你有你的行列表：

$arr = array(
  'this is a test',
  'glop test',
  'i like php',
  'a badword, glop is', 
);

然后，从文件中加载坏词列表：
你修剪每一行，并删除空行，以确保你只在$bad_words数组中得到“单词”，而不是会导致麻烦的空白。

$bad_words = array_filter(array_map('trim', file('your_file_with_bad_words.txt')));
var_dump($bad_words);

$bad_words数组包含来自我的测试文件：

array
  0 => string 'glop' (length=4)
  1 => string 'test' (length=4)

然后，回调函数循环遍历那些坏词：

注意：使用全局变量不是很好:-(但array_filter调用的回调函数没有得到任何其他参数，我不希望每次回调函数都加载文件被称为。

function keep_no_glop($line) {
  global $bad_words;
  foreach ($bad_words as $bad_word) {
      if (strpos($line, $bad_word) !== false) {
        return false;
      }
  }
  return true;
}

而且，和以前一样，您可以使用array_filter过滤行：

$arr_filtered = array_filter($arr, 'keep_no_glop');
var_dump($arr_filtered);

这一次，给你：

array
  2 => string 'i like php' (length=10)

Answer 2

$file=file("problem.txt");
$a = preg_grep("/martin|john/",$file,PREG_GREP_INVERT );
print_r($a);

Answer 3

查看strpos功能。它可以告诉您字符串是否包含另一个字符串（以及第二个字符串在第二个字符串中的确切位置）。你可以这样使用它：

$good = array();
$bad_words = array('martin', 'methew');

// for every line in the file
foreach($raw as $line) {
  // check for each word we want to avoid
  foreach($bad_words as $word) {
    // if this line has a trigger word
    if(strpos($line, $word) !== false) {
      // skip it and start processing the next
      continue 2;
    }
  }

  // no triggers hit, line is clean
  $good[] = $line;
}

现在，您将在$good中找到仅包含简洁行的列表。

Answer 4

这将删除其中包含黑名单字词的所有行：

$rows = file("problem.txt");    
$blacklist = "foo|bar|lol";

foreach($rows as $key => $row) {
    if(preg_match("/($blacklist)/", $row)) {
        unset($rows[$key]);
    }
}

file_put_contents("solved.txt", implode("\n", $rows));

或者，如果您使用的是PHP 5.3，则可以使用带有array_filter的lambda函数：

$rows = file("problem.txt");    
$blacklist = "foo|bar|lol";
$rows = array_filter($rows, function($row) {
    return preg_match("/($blacklist)/", $row);
});

file_put_contents("solved.txt", implode("\n", $rows));

在PHP 5.3之前，使用array_filter的解决方案实际上会占用比我发布的第一个解决方案更多的行，所以我会把它留下来。

Answer 5

如果你有一个长字符串而不是一个文件，并且你想要删除所有具有特定单词的字符串行。你可以用这个：

$string="I have a long string\n
  That has good words inside.\n
   I love my string.\n
  //add some words here\n";
$rows = explode("\n",$string);
$unwanted = "tring|\/\/";
$cleanArray= preg_grep("/$unwanted/i",$rows,PREG_GREP_INVERT);
$cleanString=implode("\n",$cleanArray);
print_r ( $cleanString );

删除包含“tring”和“//”。

的行

Answer 6

假设你有一系列“坏词”：

<?php
foreach ($raw as $key=>$line)
{
    foreach ($badwords as $w)
    {
        if ( strpos($line, $w) !== false )
            unset($raw[$key]);
    }
}
?>

Answer 7

<?php
$source = "problem.txt";
$raw = file_get_contents($source) or die("Cannot read file");
$wordlist = "martin|methew|asshole";
$raw = preg_replace("/($wordlist)/ie", "", $raw);
file_put_contents($source, $raw);
?>

使用PHP删除包含特定单词/短语的行

7 个答案: