Question

我有一个文本文件目录。我想循环遍历目录中的每个文本文件，并获得唯一单词（词汇表的数量）的总体计数，而不是每个单独的文件，而是所有文件在一起。换句话说，我希望所有文件中的唯一单词数量在一起，而不是每个单独文件的唯一单词数量。

例如，我在目录中有三个文本文件。以下是他们的内容：

file1.txt - ＆gt;这是一些文字。

file2.txt - ＆gt;这里有更多的文字。

file3.txt - ＆gt;更多文字。

因此，在这种情况下，此文本文件目录的唯一字数是6.

我试过使用这段代码：

$files = glob("C:\\wamp\\dir");

$out = fopen("mergedFiles.txt", "w");


  foreach($files as $file){
      $in = fopen($file, "r");
      while ($line = fread($in)){
           fwrite($out, $line);
      }
      fclose($in);
  }


  fclose($out);

合并所有文本文件，然后在使用此代码后，我计划在mergedFiles.txt上使用array_unique（）。但是，代码不起作用。

如何以最佳方式获取目录中所有文本文件的唯一字数？

Answer 1

你可以试试这个：

$allWords = array();

foreach (glob("*.txt") as $filename) // loop on each file
{
    $contents = file_get_contents($filename); // Get file contents
    $words = explode(' ', $contents); // Make an array with words

    if ( $words )
        $allWords = array_merge($allWords, $words); // combine global words array and file words array
}

var_dump(count(array_unique($allWords)));

编辑其他版本：

删除点
删除多个空格
如果在句子结尾和新句子之间缺少空格，则匹配单词。

function removeDot($string) {
    return rtrim($string, '.');
}

$words = explode(' ', preg_replace('#\.([a-zA-Z])#', '. $1', preg_replace('/\s+/', ' ',$contents)));
$words = array_map("removeDot", $words);

如何从PHP中的文本文件目录中获取唯一单词的数量？

1 个答案: