如何将搜索字词捆绑到更有效的查询中?

时间:2016-05-11 18:36:12

标签: php algorithm search graph combinations

我需要将搜索字词列表转换为最有效的组合搜索字词集。任何单词或引用的短语都可以用OR分隔。许多术语可以在括号内组合。也可以使用AND。

例如,foo barboo bar分享bar,因此可以将(foo OR boo) AND bar合并为foo bar boo bar goo bar hoo doo foo manchu moo bar too bar foo fighters "blue kazoo" bar baz qux quux 而不是两个不同的搜索字词。

这是算法需要做的事情。鉴于此数据集:

(foo OR boo OR goo OR moo OR too OR "blue kazoo") AND bar
foo AND (manchu OR fighters)
hoo doo
baz OR qux OR quux

我希望得到以下回复:

(foo bar) OR (boo bar) OR (goo bar) OR (foo manchu)

这不起作用:

Send-MailMessage -SmtpServer "yoursmtpserver"

我将使用PHP,但我会用伪代码,PHP或我将从主要语言转换答案。

3 个答案:

答案 0 :(得分:4)

我收到了以下代码:

function keyMultiSort(&$array,
                      $key,
                      $reverse = false,
                      $priority_last = false,
                      $save_key = true,
                      Callable $func = null)
{
    if ($func === null)
    {
        $func = function ($first, $second) use ($key, $reverse, $priority_last)
        {
            if (!isset($first[$key]))
            {
                return ($reverse === false) ? -1 : 1;
            }
            if (!isset($second[$key]))
            {
                return ($reverse === false) ? 1 : -1;
            }

            if ($first[$key] > $second[$key])
            {
                return ($reverse === false) ? 1 : -1;
            }
            if ($first[$key] < $second[$key])
            {
                return ($reverse === false) ? -1 : 1;
            }
            if ($first[$key] === $second[$key])
            {
                return ($priority_last === false) ? 1 : -1;
            }

            return 0;
        };
    }

    if ($save_key)
    {
        uasort($array, $func);
    }
    else
    {
        usort($array, $func);
    }
}

$array = [
    ['foo', 'bar'],
    ['boo', 'bar'],
    ['goo', 'bar'],
    ['hoo', 'doo'],
    ['foo', 'manchu'],
    ['moo', 'bar'],
    ['too', 'bar'],
    ['foo', 'fighters'],
    ['blue kazoo', 'bar'],
];

$pairs = [];
$str = '';
foreach($array as $item)
{
    if(!isset($pairs[$item[0]]['count']))
    {
        $pairs[$item[0]]['count'] = 1;
    }
    else
    {
        $pairs[$item[0]]['count']++;
    }
    $pairs[$item[0]]['elements'][] = $item[1];

    if(!isset($pairs[$item[1]]['count']))
    {
        $pairs[$item[1]]['count'] = 1;
    }
    else
    {
        $pairs[$item[1]]['count']++;
    }
    $pairs[$item[1]]['elements'][] = $item[0];
    keyMultiSort($pairs, 'count', true);
}

$remove = [];
foreach($pairs as $elm=>$item)
{
    $remove[] = $elm;
    $elements = array_diff($item['elements'], $remove);
    if(empty($elements))
    {
        if (in_array($elm, $remove))
        {
            continue;
        }
        $str .= $elm.PHP_EOL;
    }
    else
    {
        $str .= $elm.' AND ('.implode(' OR ', $elements).')'.PHP_EOL;
    }
    $remove = array_merge($remove, $elements);
}
var_dump($str);

结果:

string(99) "bar AND (foo OR boo OR goo OR moo OR too OR blue kazoo)
foo AND (manchu OR fighters)
hoo AND (doo)
"

可根据目标进行优化......

答案 1 :(得分:3)

我理解逻辑,但你真的需要让问题更清晰。

无论如何,我认为这是一个图形问题,我们希望找到程度最高且可以跨越整个图形的节点集。

enter image description here

我相信如果你以这种方式描绘它,你可以使用你喜欢的任何数据结构来达到目的。您可以创建邻接列表,然后查找具有更高度数的节点,然后检查是否所有元素都通过这些节点覆盖。之后添加AND,OR的问题很简单。

答案 2 :(得分:1)

处理2个以上值的代码

<?php
function keyMultiSort(&$array,
                      $key,
                      $reverse = false,
                      $priority_last = false,
                      $save_key = true,
                      Callable $func = null)
{
    if ($func === null)
    {
        $func = function ($first, $second) use ($key, $reverse, $priority_last)
        {
            if (!isset($first[$key]))
            {
                return ($reverse === false) ? -1 : 1;
            }
            if (!isset($second[$key]))
            {
                return ($reverse === false) ? 1 : -1;
            }

            if ($first[$key] > $second[$key])
            {
                return ($reverse === false) ? 1 : -1;
            }
            if ($first[$key] < $second[$key])
            {
                return ($reverse === false) ? -1 : 1;
            }
            if ($first[$key] === $second[$key])
            {
                return ($priority_last === false) ? 1 : -1;
            }

            return 0;
        };
    }

    if ($save_key)
    {
        uasort($array, $func);
    }
    else
    {
        usort($array, $func);
    }
}

$array = [
    ['foo', 'bar', 'test'],
    ['boo', 'bar'],
    ['goo', 'bar'],
    ['hoo', 'doo', 'test', 'test2'],
    ['foo', 'manchu'],
    ['moo', 'bar'],
    ['too', 'bar'],
    ['foo', 'fighters'],
    ['blue kazoo', 'bar', 'test'],
];

$pairs = [];
$str = '';
foreach($array as $item)
{
    foreach($item as $key=>$elm)
    {
        foreach($item as $key2=>$elm2)
        {
            if($key !== $key2)
            {
                if(!isset($pairs[$elm]['count']))
                {
                    $pairs[$elm]['count'] = 1;
                }
                else
                {
                    $pairs[$elm]['count']++;
                }
                $pairs[$elm]['elements'][] = $elm2;
            }
        }
    }

    keyMultiSort($pairs, 'count', true);
}
//var_dump($pairs);
$remove = [];
foreach($pairs as $elm=>$item)
{
    $remove[] = $elm;
    $elements = array_diff($item['elements'], $remove);
    if(empty($elements))
    {
        if (in_array($elm, $remove))
        {
            continue;
        }
        $str .= $elm.PHP_EOL;
    }
    else
    {
        $str .= $elm.' AND ('.implode(' OR ', array_unique($elements)).')'.PHP_EOL;
    }
}
var_dump($str);

<强>响应:

string(184) "bar AND (foo OR test OR boo OR goo OR moo OR too OR blue kazoo)
test AND (foo OR hoo OR doo OR test2 OR blue kazoo)
foo AND (manchu OR fighters)
hoo AND (doo OR test2)
doo AND (test2)
"

P.S。我希望我已正确理解这项任务......

<强>更新 添加了不忽略“单个值”的代码。 我改变了逻辑:

...

['"yellow balloon"', 'foo', 'bar', 'baz', 'qut'],

...

返回:

...

qut AND ("yellow balloon" OR baz)
baz AND ("yellow balloon")

...

在我看来,对于这项任务来说,这是正确的(对于组合超过2个值的条件)。

function keyMultiSort(&$array,
                      $key,
                      $reverse = false,
                      $priority_last = false,
                      $save_key = true,
                      Callable $func = null)
{
    if ($func === null)
    {
        $func = function ($first, $second) use ($key, $reverse, $priority_last)
        {
            if (!isset($first[$key]))
            {
                return ($reverse === false) ? -1 : 1;
            }
            if (!isset($second[$key]))
            {
                return ($reverse === false) ? 1 : -1;
            }

            if ($first[$key] > $second[$key])
            {
                return ($reverse === false) ? 1 : -1;
            }
            if ($first[$key] < $second[$key])
            {
                return ($reverse === false) ? -1 : 1;
            }
            if ($first[$key] === $second[$key])
            {
                return ($priority_last === false) ? 1 : -1;
            }

            return 0;
        };
    }

    if ($save_key)
    {
        uasort($array, $func);
    }
    else
    {
        usort($array, $func);
    }
}

$array = [
    ['foo', 'bar', 'test'],
    ['boo', 'bar'],
    ['goo', 'bar'],
    ['hoo', 'doo', 'test', 'test2'],
    ['foo', 'manchu'],
    ['moo', 'bar'],
    ['too', 'bar'],
    ['foo', 'fighters'],
    ['"blue kazoo"', 'bar', 'test'],
    ['"red panda"', 'bar', 'test'],
    ['"yellow balloon"', 'foo', 'bar', 'baz', 'qut'],
    ['"red panda"', 'fighters', 'moo'],
    ['"foo fighters"'],
    ['foo'],
    ['bar'],
];

$pairs = [];
$singles = [];
$str = '';
foreach ($array as $item)
{
    foreach ($item as $key => $elm)
    {
        if(count($item) === 1)
        {
            $singles[$elm] = 1;
        }
        else
        {
            if (!isset($pairs[$elm]))
            {
                $pairs[$elm]['count'] = 0;
                $pairs[$elm]['elements'] = [];
            }
            foreach ($item as $key2 => $elm2)
            {
                if ($key !== $key2)
                {
                    $pairs[$elm]['count']++;
                    $pairs[$elm]['elements'][] = $elm2;
                }
            }
        }
    }

    keyMultiSort($pairs, 'count', true);
}
//var_dump($pairs);exit;
$remove = [];
foreach ($pairs as $elm => $item)
{
    $remove[] = $elm;
    $elements = array_diff($item['elements'], $remove);
    $elements = array_unique($elements);
    if (!empty($elements)){
        $str .= $elm.' AND ('.implode(' OR ', $elements).')'.PHP_EOL;
    }
}
foreach ($singles as $elm => $item)
{
    $str .= $elm.PHP_EOL;
}
var_dump($str);

响应:

string(421) "bar AND (foo OR test OR boo OR goo OR moo OR too OR "blue kazoo" OR "red panda" OR "yellow balloon" OR baz OR qut)
test AND (foo OR hoo OR doo OR test2 OR "blue kazoo" OR "red panda")
foo AND (manchu OR fighters OR "yellow balloon" OR baz OR qut)
"red panda" AND (fighters OR moo)
qut AND ("yellow balloon" OR baz)
baz AND ("yellow balloon")
test2 AND (hoo OR doo)
fighters AND (moo)
doo AND (hoo)
"foo fighters"
foo
bar
"

<强> P.S。在我看来,这个问题不适用于现实