假设我有一个这样的数组:
$data[0]['name'] = 'product 1 brandX';
$data[0]['id_product'] = '77777777';
$data[1]['name'] = 'brandX product 1';
$data[1]['id_product'] = '77777777';
$data[2]['name'] = 'brandX product 1 RED';
$data[2]['id_product'] = '77777777';
$data[3]['name'] = 'product 1 brandX';
$data[3]['id_product'] = '';
$data[4]['name'] = 'product 2 brandY';
$data[4]['id_product'] = '8888888';
$data[5]['name'] = 'product 2 brandY RED';
$data[5]['id_product'] = '';
我试图按照它们的相似性(name或id_product)对它们进行分组。
这将是预期的最终数组:
$uniques[0]['name'] = 'product 1 brandX'; //The smallest name for the product
$uniques[0]['count'] = 4; //Entry which has all the words of the smallest name or the same id_product
$uniques[0]['name'] = 'product 2 brandY';
$uniques[0]['count'] = 2;
这就是我到目前为止所做的:
foreach ($data as $t) {
if (!isset($uniques[$t['id_product']]['name']) || mb_strlen($uniques[$t['id_product']]['name']) > mb_strlen($t['name'])) {
$uniques[$t['id_product']]['name'] = $t['name'];
$uniques[$t['id_product']]['count']++;
}
}
但是我不能基于id_product,因为有时候它会是同一个产品但是一个会有id而另一个会没有。我也必须检查名称,但无法完成它。
答案 0 :(得分:0)
我认为这不会解决您的问题,但可能会让您再次前进
$data = [];
$data[0]['name'] = 'product 1 brandX';
$data[0]['id_product'] = '77777777';
$data[1]['name'] = 'brandX product 1';
$data[1]['id_product'] = '77777777';
$data[2]['name'] = 'brandX product 1 RED';
$data[2]['id_product'] = '77777777';
$data[3]['name'] = 'product 1 brandX';
$data[3]['id_product'] = '';
$data[4]['name'] = 'product 2 brandY';
$data[4]['id_product'] = '8888888';
$data[5]['name'] = 'product 2 brandY RED';
$data[5]['id_product'] = '';
$data = collect($data);
$tallies = [
'brand_x' => 0,
'brand_y' => 0,
'other' => 0
];
$unique = $data->unique(function ($item) use (&$tallies){
switch(true){
case(strpos($item['name'], 'brandX') !== false):
$tallies['brand_x']++;
return 'product X';
break;
case(strpos($item['name'], 'brandY') !== false):
$tallies['brand_y']++;
return 'product Y';
break;
default:
$tallies['other']++;
return 'other';
break;
}
});
print_r($unique);
print_r($tallies);
答案 1 :(得分:0)
我认为解决此问题的最佳方法是使用唯一的product_id
,但如果您想通过在名称字段中查找相似性来创建唯一键,则可以使用preg_split将名称转换为数组,然后使用array_diff查找差异数组。如果2个名称的差值小于2,则认为这两个名称是唯一的。我创建此函数,如果找不到,则会在$arr
或false
中返回相似的名称:
function get_similare_key($arr, $name) {
$names = preg_split("/\s+/", $name);
// get similaire key from $arr
foreach( $arr as $key => $value ) {
$key_names = preg_split("/\s+/", $key);
$diff = array_diff($key_names, $names);
if ( count($diff) <= 1 ) {
return $key;
}
}
return false;
}
这是一个有效的演示here
答案 2 :(得分:0)
我的答案基于关于如何对产品进行分组的两个假设:
虽然id_product
可能会丢失,但它存在的地方却是。{
正确且足以匹配两种产品;以及
要匹配两个产品名称,最长的name
(名称最多
单词)必须包含最短name
中的所有单词(名称带有
最少的单词)。
鉴于这些假设,这里有一个函数来确定两个单独的产品是否匹配(产品应该组合在一起)和一个辅助函数来从名称中获取单词:
function productsMatch(array $product1, array $product2)
{
if (
!empty($product1['id_product'])
&& !empty($product2['id_product'])
&& $product1['id_product'] === $product2['id_product']
) {
// match based on id_product
return true;
}
$words1 = getWordsFromProduct($product1);
$words2 = getWordsFromProduct($product2);
$min_word_count = min(count($words1), count($words2));
$match_word_count = count(array_intersect_key($words1, $words2));
if ($min_word_count >= 1 && $match_word_count === $min_word_count) {
// match based on name similarity
return true;
}
// no match
return false;
}
function getWordsFromProduct(array $product)
{
$name = mb_strtolower($product['name']);
preg_match_all('/\S+/', $name, $matches);
$words = array_flip($matches[0]);
return $words;
}
此功能可用于对产品进行分组:
function groupProducts(array $data)
{
$groups = array();
foreach ($data as $product1) {
foreach ($groups as $key => $products) {
foreach ($products as $product2) {
if (productsMatch($product1, $product2)) {
$groups[$key][] = $product1;
continue 3; // foreach ($data as $product1)
}
}
}
$groups[] = array($product1);
}
return $groups;
}
然后,此函数可用于提取每个组的最短名称和计数:
function uniqueProducts(array $groups)
{
$uniques = array();
foreach ($groups as $products) {
$shortest_name = '';
$shortest_length = PHP_INT_MAX;
$count = 0;
foreach ($products as $product) {
$length = mb_strlen($product['name']);
if ($length < $shortest_length) {
$shortest_name = $product['name'];
$shortest_length = $length;
}
$count++;
}
$uniques[] = array(
'name' => $shortest_name,
'count' => $count,
);
}
return $uniques;
}
因此,结合所有4个函数,您可以获得如下的uniques(使用php 5.6测试):
$data[0]['name'] = 'product 1 brandX';
$data[0]['id_product'] = '77777777';
$data[1]['name'] = 'brandX product 1';
$data[1]['id_product'] = '77777777';
$data[2]['name'] = 'brandX product 1 RED';
$data[2]['id_product'] = '77777777';
$data[3]['name'] = 'product 1 brandX';
$data[3]['id_product'] = '';
$data[4]['name'] = 'product 2 brandY';
$data[4]['id_product'] = '8888888';
$data[5]['name'] = 'product 2 brandY RED';
$data[5]['id_product'] = '';
$groups = groupProducts($data);
$uniques = uniqueProducts($groups);
var_dump($uniques);
提供输出:
array(2) {
[0]=>
array(2) {
["name"]=>
string(16) "product 1 brandX"
["count"]=>
int(4)
}
[1]=>
array(2) {
["name"]=>
string(16) "product 2 brandY"
["count"]=>
int(2)
}
}