如何基于其中一个值的相似性对一个动态数组进行array_merge

时间:2010-09-09 16:18:40

标签: php arrays merge similarity

美好的一天,

我正在使用cURL和各种解析技术从各种网站检索信息。我做了 如果需要,我可以添加其他网站来扫描信息。

检索到的信息如下: (请注意,信息可能不准确,可能无法反映实际价格/名称)

Array
(
    [website1.com] => Array
        (
            [0] => Array
                (
                    [0] => 60" BRAVIA LX900 Series 3D HDTV
                    [1] => website1.com
                    [2] => 5299.99
                )
            [1] => Array
                (
                    [0] => 52" BRAVIA LX900 Series 3D HDTV
                    [1] => website1.com
                    [2] => 4499.99
                )
            [2] => Array
                (
                    [0] => 46" BRAVIA LX900 Series 3D HDTV
                    [1] => website1.com
                    [2] => 3699.99
                )
            [3] => Array
                (
                    [0] => 40" BRAVIA LX900 Series 3D HDTV
                    [1] => website1.com
                    [2] => 2999.99
                )
        )
    [website2.com] => Array
        (
            [0] => Array
                (
                    [0] => Sony 3D 60" LX900 HDTV BRAVIA
                    [1] => website2.com
                    [2] => 5400.99
                )
            [1] => Array
                (
                    [0] => Sony 3D 52" LX900 HDTV BRAVIA
                    [1] => website2.com
                    [2] => 4699.99
                )
            [2] => Array
                (
                    [0] => Sony 3D 46" LX900 HDTV BRAVIA
                    [1] => website2.com
                    [2] => 3899.99
                )
        )
)

所需的输出必须是:

Array
(
    [0] => Array
        (
            [Name] => 60" BRAVIA LX900 Series 3D HDTV
            [website1.com] => 5299.99
            [website2.com] => 5400.99
        )
    [1] => Array
        (
            [Name] => 52" BRAVIA LX900 Series 3D HDTV
            [website1.com] => 4499.99
            [website2.com] => 4699.99
        )
    [2] => Array
        (
            [Name] => 46" BRAVIA LX900 Series 3D HDTV
            [website1.com] => 3699.99
            [website2.com] => 3899.99
        )
    [3] => Array
        (
            [Name] => 40" BRAVIA LX900 Series 3D HDTV
            [website1.com] => 2999.99
        )
)

请注意,名称可能会有所不同,因此需要使用similar_text。此外,某些信息可能不会在所有网站上显示。 我知道只有一个电视名称必须选择,然后我将使用最相关的来源(website1.com)

以下是我正在努力工作的代码。

<?php
    $_Retreived = array(
        "website1.com" => array(
            array('60" BRAVIA LX900 Series 3D HDTV', 'website1.com', 5299.99),
            array('52" BRAVIA LX900 Series 3D HDTV', 'website1.com', 4499.99),
            array('46" BRAVIA LX900 Series 3D HDTV', 'website1.com', 3699.99),
            array('40" BRAVIA LX900 Series 3D HDTV', 'website1.com', 2999.99)
        ),
        "website2.com" => array(
            array('Sony 3D 60" LX900 HDTV BRAVIA', 'website2.com', 5400.99),
            array('Sony 3D 52" LX900 HDTV BRAVIA', 'website2.com', 4699.99),
            array('Sony 3D 46" LX900 HDTV BRAVIA', 'website2.com', 3899.99),
        )
    );

    $_Prices = array();
    $_PricesTemp = array();
    $_Sites = array("website1.com", "website2.com");

    for($i = 0; $i < sizeOf($_Sites); $i++)
    {
        $_PricesTemp = array_merge($_PricesTemp, $_Retreived[ $_Sites[$i] ]);
    }

    /*
        print_r($_PricesTemp);

        Array
        (
            [0] => Array
                (
                    [0] => 60" BRAVIA LX900 Series 3D HDTV
                    [1] => website1.com
                    [2] => 5299.99
                )
            [1] => Array
                (
                    [0] => 52" BRAVIA LX900 Series 3D HDTV
                    [1] => website1.com
                    [2] => 4499.99
                )
            [2] => Array
                (
                    [0] => 46" BRAVIA LX900 Series 3D HDTV
                    [1] => website1.com
                    [2] => 3699.99
                )
            [3] => Array
                (
                    [0] => 40" BRAVIA LX900 Series 3D HDTV
                    [1] => website1.com
                    [2] => 2999.99
                )
            [4] => Array
                (
                    [0] => Sony 3D 60" LX900 HDTV BRAVIA
                    [1] => website2.com
                    [2] => 5400.99
                )
            [5] => Array
                (
                    [0] => Sony 3D 52" LX900 HDTV BRAVIA
                    [1] => website2.com
                    [2] => 4699.99
                )
            [6] => Array
                (
                    [0] => Sony 3D 46" LX900 HDTV BRAVIA
                    [1] => website2.com
                    [2] => 3899.99
                )
        )
    */

    foreach($_PricesTemp As $_KeyOne => $_EntryOne)
    {
        foreach(array_reverse($_PricesTemp, true) As $_KeyTwo => $_EntryTwo)
        {
            if ($_KeyOne != $_KeyTwo)
            {
                $_Percent = 0;

                similar_text(strtoupper($_EntryOne[0]), strtoupper($_EntryTwo[0]), $_Percent);

                if ($_Percent >= 90) //If names matches 90%+
                {
                    echo "Similar : <b>" . $_KeyOne . "</b> " . $_EntryOne[0] . " and <b>" . $_KeyTwo . "</b> " . $_EntryTwo[0] . " Percent : " . $_Percent . "<br />";

                    $_Prices[] = array();
                    $_Prices[ sizeOf($_Prices)-1 ]['Name'] = $_EntryOne[0]; //Use the product name of the most revelant website (website1.com)

                    foreach($_Sites As $_Site)
                    {
                        if (isset($_EntryOne[ 1 ]) && $_EntryOne[ 1 ] == $_Site) //Check if it contains price from website1.com
                        {
                            $_Prices[ sizeOf($_Prices)-1 ][ $_Site ] = $_EntryOne[ 2 ];
                        }
                        if (isset($_EntryTwo[ 1 ]) && $_EntryTwo[ 1 ] == $_Site) //Check if it contains price from website2.com
                        {
                            $_Prices[ sizeOf($_Prices)-1 ][ $_Site ] = $_EntryTwo[ 2 ];
                        }
                    }
                }
            }
        }
    }

    /*
        print_r($_Prices);

        Array
        (
            [0] => Array
                (
                    [Name] => 60" BRAVIA LX900 Series 3D HDTV
                    [website1.com] => 2999.99
                )
            [1] => Array
                (
                    [Name] => 60" BRAVIA LX900 Series 3D HDTV
                    [website1.com] => 3699.99
                )
            [2] => Array
                (
                    [Name] => 60" BRAVIA LX900 Series 3D HDTV
                    [website1.com] => 4499.99
                )
            [3] => Array
                (
                    [Name] => 52" BRAVIA LX900 Series 3D HDTV
                    [website1.com] => 2999.99
                )
            [4] => Array
                (
                    [Name] => 52" BRAVIA LX900 Series 3D HDTV
                    [website1.com] => 3699.99
                )
            [5] => Array
                (
                    [Name] => 52" BRAVIA LX900 Series 3D HDTV
                    [website1.com] => 5299.99
                )
            [6] => Array
                (
                    [Name] => 46" BRAVIA LX900 Series 3D HDTV
                    [website1.com] => 2999.99
                )
            [7] => Array
                (
                    [Name] => 46" BRAVIA LX900 Series 3D HDTV
                    [website1.com] => 4499.99
                )
            [8] => Array
                (
                    [Name] => 46" BRAVIA LX900 Series 3D HDTV
                    [website1.com] => 5299.99
                )
            [9] => Array
                (
                    [Name] => 40" BRAVIA LX900 Series 3D HDTV
                    [website1.com] => 3699.99
                )
            [10] => Array
                (
                    [Name] => 40" BRAVIA LX900 Series 3D HDTV
                    [website1.com] => 4499.99
                )
            [11] => Array
                (
                    [Name] => 40" BRAVIA LX900 Series 3D HDTV
                    [website1.com] => 5299.99
                )
            [12] => Array
                (
                    [Name] => Sony 3D 60" LX900 HDTV BRAVIA
                    [website2.com] => 3899.99
                )
            [13] => Array
                (
                    [Name] => Sony 3D 60" LX900 HDTV BRAVIA
                    [website2.com] => 4699.99
                )
            [14] => Array
                (
                    [Name] => Sony 3D 52" LX900 HDTV BRAVIA
                    [website2.com] => 3899.99
                )
            [15] => Array
                (
                    [Name] => Sony 3D 52" LX900 HDTV BRAVIA
                    [website2.com] => 5400.99
                )
            [16] => Array
                (
                    [Name] => Sony 3D 46" LX900 HDTV BRAVIA
                    [website2.com] => 4699.99
                )
            [17] => Array
                (
                    [Name] => Sony 3D 46" LX900 HDTV BRAVIA
                    [website2.com] => 5400.99
                )
        )
    */
?>

首先,上面的代码不起作用。某个地方必定存在逻辑错误,我无法指责。另外,我没有 相信代码将起作用,以防我在列表中添加第三个网站。

任何想法的家伙?从今天早上起我就一直在这。

编辑2011-02-16:

我在这个问题上加了一笔赏金。

3 个答案:

答案 0 :(得分:1)

尝试这个要点更清晰https://gist.github.com/835099

它为我产生了你期望的结果。

答案 1 :(得分:0)

高级概述应该是这样的:

  • 创建最终结果数组$ items
  • 遍历所有网站中找到的所有项目
  • 对于每一个,检查它是否与$ items
  • 中的任何现有项目名称相似
  • 如果是,则将价格添加到该密钥,如果否,则创建一个新密钥并将其添加到该密钥

您应该考虑使用similar_text()代替levenshtein(),而$levThreshold = 3 ; $_Prices = array() ; foreach ($_Retreived as $website => $websiteItems) { $currName = $websiteItems[0] ; $currWebsite = $websiteItems[1] ; $currPrice = $websiteItems[2] ; $foundItemKey = false ; //check current price structure. Get $priceData by reference //so we can modify it in the loop and keep the changed instead //of the loop copy. foreach ($_Prices as &$priceData) { if (isset($priceData[$website])) { //already done this continue ; } //check if this is the item name we are looping over $lev = levenshtein($priceData['Name'], $currName) ; if ($lev < $levThreshold) { //item exists, add price and break $priceData[$website] = $currPrice ; $foundItemKey = true ; break ; } } //if we haven't found the item key, create a new one if (!$foundItemKey) { $newItem = array() ; $newItem['Name'] = $currName ; $newItem[$website] = $currPrice ; $_Prices[] = $newItem ; } } 在实践中类似但速度相当快。

这是一些(未经测试的,现场)代码:

$levThreshold

{{1}}是2个字符串之间必须不同的最小字符数,以使它们被视为不同。你可以相应地调整它。

答案 2 :(得分:0)

使用similar_text无法解决问题。您希望60" BRAVIA LX900 Series 3D HDTVSony 3D 60" LX900 HDTV BRAVIA匹配。但是,60" BRAVIA LX900 Series 3D HDTV实际上与52" BRAVIA LX900 Series 3D HDTV更相似,只有两个字符不同。

我怀疑您需要一个自定义处理程序来匹配特定于您要匹配的产品的详细信息。例如。对于电视,您可能希望匹配尺寸(xx")和产品系列(BRAVIA LX900)。

这并不能解决你的问题,但我担心答案。