MySQL计算结果集中列值的更改

时间:2013-12-04 17:49:43

标签: mysql count subquery distinct

我有两张桌子:

1)任务 - 代表任务。它只有一个主键,因为所有相关数据都在task_version表中(任务HAS_MANY task_version)。

CREATE TABLE task(
  id int(11) unsigned NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (id)
);

样本数据:

INSERT INTO task VALUES ('1');
INSERT INTO task VALUES ('2');

2) task_version - 任何任务中的任何更改都会在此表中创建新行。 task_id应该是外键(为简单起见,省略)。这是完成任务中所有更改的完整主题。

CREATE TABLE `task_version` (
id int(10) unsigned NOT NULL AUTO_INCREMENT,
task_id int(11) DEFAULT NULL,
name varchar(255) DEFAULT NULL,
text varchar(255) DEFAULT NULL,
status int(11) DEFAULT NULL,
PRIMARY KEY (id)
);

示例数据:

INSERT INTO `task_version` VALUES ('1', '1', 'Name of task', 'Text of task', '1');
INSERT INTO `task_version` VALUES ('2', '1', 'Name of task', 'Text of task', '1');
INSERT INTO `task_version` VALUES ('3', '1', 'Name of task', 'Text of task', '2');
INSERT INTO `task_version` VALUES ('4', '1', 'Name of task', 'Text of task', '1');
INSERT INTO `task_version` VALUES ('5', '2', 'Name', 'Text', '1');

我需要的是获得每项任务的状态变化数量。

显然我不能只查询这样的不同状态:

SELECT
(
  SELECT
  COUNT(DISTINCT status)
  FROM task_version
  WHERE task_id = t.id
) AS distinct_statuses_per_task,
t.id AS task_id
FROM task t
INNER JOIN task_version tv ON t.id = tv.task_id
GROUP BY t.id

因为distinct_statuses_per_task只是不同的值而不会改变qunatity。如果有人将状态从1更改为2,从2更改为1,再从1更改为2,我们将获得此状态序列:

1
2
1
2

因此,我们有2种不同的状态(1,2),但有3种状态变化(1> 2,2> 1,1> 2),所以它不起作用。

我用MySQL用户变量开发了解决方案。这是我想嵌入主查询的子查询:

SELECT
CASE WHEN (status != @prev_status AND @prev_status IS NOT NULL)
THEN @status_changes_quantity := @status_changes_quantity + 1
END as incrementing_logic,
@status_changes_quantity AS status_changes_quantity,
@prev_status := status AS save_prev
FROM task_version,
(
    SELECT
    @prev_status := NULL,
    @status_changes_quantity := 0
) as task_version_with_additional_vars
WHERE task_id = 1 --Hardcoded task_id
ORDER BY status_changes_quantity DESC
LIMIT 1

这适用于带有硬编码task_id的独立查询。但我需要将此查询作为子查询嵌入,以获得每个任务的状态更改数量。

我无法让它发挥作用。问题是当我在SELECT查询部分设置变量时,它们就成了查询结果的一部分。子查询应该返回单个标量,但我的查询返回表 (incrementing_logic,status_changes_quantity,save_prev)我不知道sintax如何摆脱这个不需要的colomns(incrementing_logic,save_prev)。

我试过这个:

SELECT
(
    SELECT
    CASE WHEN (status != @prev_status AND @prev_status IS NOT NULL)
    THEN @status_changes_quantity := @status_changes_quantity + 1
    END as incrementing_logic,
    @status_changes_quantity AS status_changes_quantity,
    @prev_status := status AS save_prev
    FROM task_version,
    (
        SELECT
        @prev_status := NULL,
        @status_changes_quantity := 0
    ) as task_version_with_additional_vars
    WHERE task_id = t.id
    ORDER BY status_changes_quantity DESC
    LIMIT 1
) AS status_changes_quantity,
t.id AS task_id,
tv.status AS task_status
FROM task t
INNER JOIN task_version tv ON t.id = tv.task_id

显然得到了:

[Err] 1241 - Operand should contain 1 column(s)

然后我尝试将子查询表包装到另一个tmp表中以摆脱变量字段和ger标量值:

SELECT
(
    SELECT
    status_changes_quantity
    FROM
    (
        SELECT

            CASE WHEN (status != @prev_status AND @prev_status IS NOT NULL)
            THEN @status_changes_quantity := @status_changes_quantity + 1
            END as incrementing_logic,

            @status_changes_quantity AS status_changes_quantity,

            @prev_status := status AS save_prev

        FROM task_version,
            (
                SELECT
                    @prev_status := NULL,
                    @status_changes_quantity := 0
            ) as task_version_with_additional_vars
        WHERE task_id = t.id
        ORDER BY status_changes_quantity DESC
        LIMIT 1
    ) AS tmp_table
) AS status_changes_quantity,
t.id AS task_id,
tv.status AS task_status
FROM task t
INNER JOIN task_version tv ON t.id = tv.task_id

我还得到一个恐怖,即t.id现在在子查询范围内不可见:

[Err] 1054 - Unknown column 't.id' in 'where clause'

也许有人知道如何解决我的问题。要纠正我的查询或建议完全不同的算法。

提前致谢。

2 个答案:

答案 0 :(得分:0)

我稍微修改了您的查询:

SELECT task_id, max( status_changes_quantity )
FROM (
  SELECT 
      task_id, id, 
      CASE WHEN @prev_task_id <> task_id 
             THEN @status_changes_quantity := 0
           WHEN status != @prev_status 
             THEN @status_changes_quantity := @status_changes_quantity + 1
           ELSE @status_changes_quantity
      END status_changes_quantity,
      @prev_task_id := task_id,
      @prev_status := status
  FROM task_version,
  (
      SELECT
      @prev_status := NULL,
      @prev_task_id := null,
      @status_changes_quantity := 0
  ) as task_version_with_additional_vars
  -- WHERE task_id = 1
  ORDER BY task_id, id
) q
GROUP BY task_id
ORDER BY 2 DESC

演示 - &gt; http://www.sqlfiddle.com/#!2/c9ecc/14

此查询计算所有task_id的状态更改次数,
并且也仅针对一个给定任务 - 如果您取消注释-- WHERE task_id = 1条款。

答案 1 :(得分:0)

@kordirko非常感谢。你的纠正成功了。 Actualy,根据这篇文章http://www.xaprb.com/blog/2006/12/15/advanced-mysql-user-variable-techniques/我设法从结果集中删除变量赋值以避免使用tmp表。

所有我需要做的(如果我理解的话)是隐藏在函数GREATEST中的变量赋值,在另外的WHERE子句中,它总是渐渐变为TRUE,如:

WHERE task_id = t.id   
AND GREATEST(
    @var1 := if(1 = 1, 'some_value', 'alt_value'),--conditional logic instead of CASE WHEN
    @var := 123 -- simple assignment
)-- this should evolute to true

所以最终版本是这样的:

SELECT
(
    SELECT
max(@status_changes_quantity) AS status_changes_quantity
    FROM task_version,
    (
        SELECT
        @prev_status := NULL,
        @status_changes_quantity := 0,
        @prev_task_id :=0
    ) as task_version_with_additional_vars
    WHERE GREATEST(
              @status_changes_quantity := if(task_id != @prev_task_id, 0, @status_changes_quantity),
              @prev_task_id := task_id,
              @status_changes_quantity := if((status != @prev_status AND @prev_status IS NOT NULL), @status_changes_quantity + 1, @status_changes_quantity),
              @prev_status := status
    )
    AND task_id = t.id
) AS status_changes_quantity,
t.id AS task_id
FROM task t
INNER JOIN task_version tv ON t.id = tv.task_id
GROUP BY t.id