我正在尝试编写遍历.CSV文件的php代码,并识别第一行和第二行,第一行和第三行之间的相似词语。
我的test.csv文件看起来如下:
1,"most of the birds lives in the village"
2,"birds in the village are free"
3,"no birds can be trapped in the village"
4,"people of the village are so kind"
5,"these birds are special species where most of them are so small"
注意:上面示例中显示的数字位于test.csv文件中,但它们是比较的关键,它称之为句子ID,我将在比较中使用它们。
因此,通过使用上面的test.csv文件,我想要做的是比较第1行和第2行,并告诉有多少单词是相似的,然后将第1行与第3行进行比较,并告诉我们有多少相似的单词是等等,并完成以下格式:
1:2 =?
1:3 =?
1:4 =?
1:5 =?
和
2:3 =?
2:4 =?
2:5 =?
和
3:4 =?
3:5 =?
和
4:5 =?
答案 0 :(得分:2)
试试这个,不完美,但它能完成任务:
// You can use PHP's array_intersect
function checkSimilarity ($string1, $string2) {
$arr1 = explode(" ",$string1 );
$arr2 = explode(" ",$string2 );
$result = array_intersect(array_unique($arr1) , array_unique($arr2)); //matched elements with duplicates removed
return count($result); //number of matches
}
$sentences = [
1 => "most of the birds lives in the village",
2 => "birds in the village are free",
3 => "no birds can be trapped in the village",
4 => "people of the village are so kind",
5 => "these birds are special species where most of them are so small"
];
// loop through array
foreach ($sentences as $mainKey => $value) {
// for each number, loop others check similarity
foreach ($sentences as $key => $v) {
// if number key exist
$compareTo = $key + 1;
if(array_key_exists($compareTo, $sentences) && $compareTo != $mainKey && $mainKey < $compareTo) {
echo $mainKey . ":" . $compareTo . " = " . checkSimilarity($value, $sentences[$compareTo]) . "\n";
}
}
}
示例沙箱:http://sandbox.onlinephpfunctions.com/code/c571beb140a1dc114b42bfd884fbe33e348f76c5
答案 1 :(得分:1)
另一个片段:
<?php
$csv = array_map('str_getcsv', file('test.csv'));
foreach ($csv as $key => $row) {
compareLine($csv, $key);
}
function compareLine($csv, $key)
{
$temp = array_slice($csv, $key);
foreach ($temp as $index => $row) {
if ($index === 0) continue;
$firstWord = $csv[$key][1];
$secondWorld = $row[1];
$diff = array_intersect(explode(" ", $firstWord), explode(" ", $secondWorld));
echo "{$csv[$key][0]}:{$row[0]}" .' = ' . implode(", ", $diff) . PHP_EOL;
}
}