我有这个代码从文本文件执行单词排名。它打开文件并输出一个数组,显示文件中每个单词出现的次数。这部分很有效,但在第二部分,代码将查看给定文件夹中的每个其他文本文件,并输出每个单词作为所有文件的总计出现的次数。问题是输出数组不是合并的总数。有重复。例如,我得到 -
the -- 2
quick -- 1
brown -- 1
fox -- 1
jumped -- 1
over -- 1
lazy -- 1
dog -- 1
dog -- 2
a -- 2
lazy -- 1
fox -- 1
cannot -- 1
catch -- 1
fast -- 1
the -- 1
may -- 1
be -- 1
而不是 -
the -- 3
dog -- 3
fox -- 2
lazy -- 2
a -- 2
quick -- 1
brown -- 1
jumped -- 1
over -- 1
very -- 1
cannot -- 1
catch -- 1
fast -- 1
may -- 1
be -- 1
这是整个代码 -
<?php
echo "<h3>Word Rank From One File</h3>";
$counted = strtolower(file_get_contents("docs/one.txt"));
$wordArray = preg_split('/[^a-z]/', $counted, -1, PREG_SPLIT_NO_EMPTY);
$wordFrequencyArray = array_count_values($wordArray);
/* Sort array from higher to lower, keeping keys */
arsort($wordFrequencyArray);
/* grab Top 10, huh sorted? */
$top10words = array_slice($wordFrequencyArray,0,10);
/* display them */
foreach ($top10words as $topWord => $frequency)
echo "$topWord -- $frequency<br/>";
echo "<h3>Total From All Files</h3>";
$path = realpath('docs');
foreach(glob($path.'/*.*') as $file) {
$counted = strtolower(file_get_contents($file));
$wordArray = preg_split('/[^a-z]/', $counted, -1, PREG_SPLIT_NO_EMPTY);
$wordFrequencyArray = array_count_values($wordArray);
$combine = array_merge($wordFrequencyArray);
/* Sort array from higher to lower, keeping keys */
arsort($wordFrequencyArray);
/* grab Top 10, huh sorted? */
$top10words = array_slice($wordFrequencyArray,0,10);
/* display them */
foreach ($top10words as $topWord => $frequency)
echo "$topWord -- $frequency<br/>";
}
?>
我做错了什么或不做什么? 两个示例文本文件有;
快速的棕色狐狸跳过懒狗。那只狐狸跳起来的狗跑得这么快。
和
一只懒狐狸抓不到快狗。狗可能很快。 我也注意到有些单词被忽略了。
答案 0 :(得分:1)
您必须汇总文件中的所有字词,然后计算其频率。
$wordArrayTotal = [];
foreach (glob($path.'/*.*') as $file) {
$counted = strtolower(file_get_contents($file));
$wordArray = preg_split('/[^a-z]/', $counted, -1, PREG_SPLIT_NO_EMPTY);
$wordArrayTotal = array_merge($wordArrayTotal, $wordArray);
}
$wordFrequencyArray = array_count_values($wordArrayTotal);
/* Sort array from higher to lower, keeping keys */
arsort($wordFrequencyArray);
/* grab Top 10, huh sorted? */
$top10words = array_slice($wordFrequencyArray, 0, 10);
/* display them */
foreach ($top10words as $topWord => $frequency) {
echo "$topWord -- $frequency<br/>";
}