我有一个非常大的数据集,我正在尝试找到满足所有数据集的最小集合。最终集必须在其中包含一个在所有数据集中的值
一小部分数据样本
[0] => Array
(
[0] => 21
[1] => 21
[2] => 21
)
[1] => Array
(
[0] => 29
)
[2] => Array
(
[0] => 27
)
[3] => Array
(
[0] => 21
[1] => 21
[2] => 21
[3] => 39
[4] => 39
[5] => 43
)
[4] => Array
(
[0] => 29
[1] => 33
[2] => 33
[3] => 43
)
在这种情况下,我需要逻辑来返回21,27和29 返回的值必须是与所有数组匹配的最小值。由于我是PHP程序员,我在PHP中编写此函数。
答案 0 :(得分:2)
您可以遵循此算法:
测试后更新
$data=array(
array(21,29,27,57,22),
array(22,21,23,24,25,26),
array(31)
);
$map = array(); // keep a map of values and how many times they occur in other sets
foreach ($data as $setid => $set) {
foreach (array_unique($set) as $v) {
$map[$v][$setid] = true;
}
}
function reverseCount(array $a, array $b)
{
return count($b) - count($a);
}
// sort reversed on intersection count
uasort($map, 'reverseCount');
// after sorting the first number will be the one that occurs the most
// keep track of which sets have been hit
$setmap = array(); $n = count($data);
foreach ($map as $v => $sets) {
$hits = 0;
foreach ($sets as $setid => $dummy) {
if (!isset($setmap[$setid])) {
--$n;
++$hits;
$setmap[$setid] = true;
} else {
continue;
}
}
if ($hits) {
echo "value $v\n";
if (!$n) {
// all sets are hit
break;
}
}
}
这次测试了。事实并没有证明总能得到正确的结果,因为这被认为是一种贪婪的近似算法。
但我希望它能说明你能做些什么。如果有什么事情让您感到困惑,或者我对它有任何不妥之处,请告诉我。)