Mongo年龄组聚合

时间:2014-08-06 00:48:30

标签: php mongodb distinct aggregation-framework

请考虑下面的收藏

 $people->insert(array("user_id" => "1", "day" => "Monday", 'age' => 18));
 $people->insert(array("user_id" => "3", "day" => "Monday", 'age' => 24));
 $people->insert(array("user_id" => "1", "day" => "Monday", 'age' => 18));
 $people->insert(array("user_id" => "1", "day" => "Monday", 'age' => 18));
 $people->insert(array("user_id" => "2", "day" => "Monday", 'age' => 25));
 $people->insert(array("user_id" => "4", "day" => "Monday", 'age' => 33));
 $people->insert(array("user_id" => "1", "day" => "Tuesday", 'age' => 18));
 $people->insert(array("user_id" => "2", "day" => "Tuesday", 'age' => 25));
 $people->insert(array("user_id" => "1", "day" => "Wednesday", 'age' => 18));
 $people->insert(array("user_id" => "2", "day" => "Thursday", 'age' => 25));
 $people->insert(array("user_id" => "1", "day" => "Friday", 'age' => 18));

任何人都可以帮我计算一个年龄段内不同用户的数量吗? 例如,对于上面的架构,我想得到

      Age 0-17 = 0, Age 18-25 = 3, Age 26-32 = 0 Age > 32 = 1

我曾尝试使用$cond运算符,但没有设法让它运行起来。 每当我尝试运行或更改它时,我会得到以下两个错误之一:

  1. " $ cond"运算符需要3个操作数 或
  2. 管道阶段规范对象必须只包含一个字段。
  3. 我的查询如下,任何帮助都非常感谢。提前谢谢,

        $query =
            array(
               $project' => array(
                    ageGroup' => array(
                       array('$cond'=>  array('$user_data.age' => array('$lt' => 18),
                                               "age_0_17",
                       array('$cond'=>  array('$user_data.age' => array('$lte' => 25),
                                               "age_18_25",
                       array('$cond'=>  array('$user_data.age' => array('$lte' => 32),
                                               "age_26_32",
                                               "age_Above_32")))))
                        )
                    ),
                ),
    
                array(
                    '$group' => array(
                        '_id'  => '$ageGroup',
                        'count' => array('$sum' => 1),
                    )
                ));
    

    @Neil Lunn的答案是正确的90%,它没有给我所需的输出但是含铅 把我带到了那里。

    根据Neil的查询,我得到的输出是:

    age_Above_32 = 1 and age_18_25 = 10 
    

    不同user_id计数的输出应为

    age_Above_32 = 1 and age_18_25 = 3 
    

    为了获得这一点,我只需要稍微调整一下Neil的查询。 最终查询如下。

    $query2 = array(
            array(
                '$group' => array(
                    '_id' => array(
                        'ageGroup' => array(
                            '$cond' =>  array(
                                array('$lt' => array( '$age', 18 )),
                                'age_0_17',
                                array(
                                    '$cond' => array(
                                        array( '$lte' => array( '$age', 25 )),
                                        'age_18_25',
                                        array(
                                            '$cond' => array(
                                                array( '$lte' => array ( '$age', 32 )),
                                                'age_26_32',
                                                'age_Above_32'
                                            )
                                        )
                                    )
                                )
                            )
                        ),
                        'user_id' =>'$user_id'
                    )
                )
    
            ),
            array(
                '$group' => array(
                    '_id'  => '$_id.ageGroup',
                    'count' => array('$sum' => 1)
                ))
        );
    

1 个答案:

答案 0 :(得分:2)

你是在正确的地方,但由于$cond需要三个参数(评估,真实结果和错误结果),你需要" nest"这些操作,每个后续$cond作为false条件。所以你的语法有点偏离。

您也可以在$group中执行此操作,以避免使用单独的$project传递整个集合。根据您提供的文档结构,您可以像这样形成:

$pipeline = array(
  array(
    '$group' => array(
      '_id' => array(
        '$cond' =>  array(
          array('$lt' => array( '$age', 18 )),
          'age_0_17',
          array(
            '$cond' => array(
              array( '$lte' => array( '$age', 25 )),
              'age_18_25',
              array(
                '$cond' => array(
                  array( '$lte' => array ( '$age', 32 )),
                  'age_26_32',
                  'age_Above_32'
                )
              )
            )
          )
        )
      ),
      'count' => array( '$sum' => 1 )
    )
  )
);

还注意到$lt之类的逻辑比较运算符在这些阶段与查询对应方的工作方式不同。他们自己将一系列参数作为测试和比较的值。他们根据该比较返回true/false,这是$cond的第一个参数的要求。

总是很方便在调试管道查询形式的地方json_encode,因为JSON将是示例的一般范围:

echo json_encode( $pipeline, JSON_PRETTY_PRINT ) . "\n";

这产生了常见的JSON结构:

[
    { "$group": {
        "_id": { 
            "$cond":[
                { "$lt":["$age",18] },
                "age_0_17",
                { "$cond":[
                    { "$lte":["$age",25] },
                    "age_18_25",
                    { "$cond":[
                        { "$lte":["$age",32] },
                        "age_26_32",
                        "age_Above_32"
                    ]}
                ]}
            ]
        },
        "count":{ "$sum": 1 }
    }}
]