我知道如何使用爆炸和一些数组函数在文本中获得单词频率,但我真正想要的是获得2个字以上的频率。例如本文:
“这是一个示例文本。这是一个用于教育目的的示例文本。”
我需要代码来执行此操作:
是(2)
示例文本(2)
样本(2)
....等等
提前致谢。
答案 0 :(得分:0)
一些伪代码可以帮助您入门:
frequencies = empty array
words = explode sentence on white spaces
for each word in words :
sanitized word = trim word and convert to lower case
frequency[ sanitized word ] ++
endforeach
frequency
数组现在包含单词出现在句子中的次数。
答案 1 :(得分:0)
以下代码将获得2个连续的字词:
$string = 'This is a sample text. It is a sample text made for educational purposes. This is a sample text. It is a sample text made for educational purposes.';
$sanitized = $even = preg_replace(array('#[^\pL\s]#', '#\s+#'), array(' ', ' '), $string); // sanitize: only letters, replace multiple whitespaces with 1
$odd = preg_replace('#^\s*\S+#', '', $sanitized); // Remove the first word
preg_match_all('#\S+\s\S+#', $even, $m1); // Get 2 words
preg_match_all('#\S+\s\S+#', $odd, $m2); // Get 2 words
$results = array_count_values(array_merge($m1[0], $m2[0])); // Merge results and count
print_r($results); // printing
<强>输出:强>
Array
(
[This is] => 2
[a sample] => 4
[text It] => 2
[is a] => 4
[sample text] => 4
[made for] => 2
[educational purposes] => 2
[It is] => 2
[text made] => 2
[for educational] => 2
[purposes This] => 1
)
一个改进是将字符串转换为小写?
我让其他人知道: - )