将长重复结果分组/聚类为数据列

时间:2012-08-22 10:04:21

标签: mysql

我在mysql而不是excel中收集一些信息。为每种细胞类型定义了一些标签,并非所有标签都可能存在。所以,我有3个标签,信息和单元格表。

select cell_name, label, information from onco_celldb_information as info 
left join onco_celldb_cells as cell on cell.`celldb_cell_id` = info.`celldb_cell_id`
left join onco_celldb_labels as label on info.`celldb_label_id` = label.`celldb_label_id`
order by cell.celldb_cell_id asc;

导致:

running query above http://f.cl.ly/items/0m2k1a410s3D0K2Y0l1u/Screen%20Shot%202012-08-22%20at%2011.57.36%20AM.png

然而我想要的是这样的东西:

CellName    Species     CellType    Origin
---------+-----------+-----------+-----------
P-815      Murine      Mastroxxxx    Human
L292      Something      Megatrone    Mouse

所以让它们按cellname分组,并将结果作为列。如果标签不存在则只有NULL(某些结果可能没有标签)。

你有什么建议?

使用数据库结构编辑:

mysql> describe celldb_cells;
+----------------+------------------+------+-----+---------+----------------+
| Field          | Type             | Null | Key | Default | Extra          |
+----------------+------------------+------+-----+---------+----------------+
| celldb_cell_id | int(11) unsigned | NO   | PRI | NULL    | auto_increment |
| cell_name      | varchar(256)     | YES  |     | NULL    |                |
+----------------+------------------+------+-----+---------+----------------+

describe celldb_information;
+-----------------------+------------------+------+-----+---------+----------------+
| Field                 | Type             | Null | Key | Default | Extra          |
+-----------------------+------------------+------+-----+---------+----------------+
| celldb_information_id | int(11) unsigned | NO   | PRI | NULL    | auto_increment |
| celldb_cell_id        | int(11) unsigned | YES  | MUL | NULL    |                |
| celldb_label_id       | int(11) unsigned | NO   | MUL | NULL    |                |
| information           | text             | YES  |     | NULL    |                |
+-----------------------+------------------+------+-----+---------+----------------+

describe celldb_labels;
+-----------------+------------------+------+-----+---------+----------------+
| Field           | Type             | Null | Key | Default | Extra          |
+-----------------+------------------+------+-----+---------+----------------+
| celldb_label_id | int(11) unsigned | NO   | PRI | NULL    | auto_increment |
| label           | varchar(256)     | YES  |     | NULL    |                |
+-----------------+------------------+------+-----+---------+----------------+

3 个答案:

答案 0 :(得分:2)

您尝试做的事情称为PIVOT,遗憾的是MySQL没有PIVOT函数,但您可以使用CASE语句和聚合函数复制它。

如果您提前知道所有标签并且它们的数量是可管理的,那么您可以对它们进行类似的硬编码:

SELECT cell_name,
  MAX(CASE WHEN label = 'Cell Type' THEN information END) 'Cell Type',
  MAX(CASE WHEN label = 'DSMZ no.' THEN information END) 'DSMZ no.'
FROM test
GROUP BY cell_name

请参阅SQL Fiddle with Demo

根据您的查询,您可以执行以下操作:

SELECT cell_name,
  MAX(CASE WHEN label = 'Cell Type' THEN information END) 'Cell Type',
  MAX(CASE WHEN label = 'DSMZ no.' THEN information END) 'DSMZ no.'
from onco_celldb_information as info 
left join onco_celldb_cells as cell 
  on cell.`celldb_cell_id` = info.`celldb_cell_id`
left join onco_celldb_labels as label 
  on info.`celldb_label_id` = label.`celldb_label_id`
GROUP BY cell_name

但是,看起来您将拥有未知数量的列,因此您需要使用预准备语句:

SET @sql = NULL;
SELECT
  GROUP_CONCAT(DISTINCT
    CONCAT(
      'MAX(case when label = ''',
      label,
      ''' then information end) AS ''',
      label, ''''
    )
  ) INTO @sql
FROM test;


SET @sql = CONCAT('SELECT cell_name, ', @sql, ' FROM test
group by cell_name');

PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

请参阅SQL Fiddle with Demo

因此,对于您的具体示例,如果是:

SET @sql = NULL;
SELECT
  GROUP_CONCAT(DISTINCT
    CONCAT(
      'MAX(case when label = ''',
      label,
      ''' then information end) AS ''',
      label, ''''
    )
  ) INTO @sql
FROM onco_celldb_labels;

SET @sql = CONCAT('SELECT cell_name, ', @sql, ' 
from onco_celldb_information as info 
left join onco_celldb_cells as cell 
  on cell.`celldb_cell_id` = info.`celldb_cell_id`
left join onco_celldb_labels as label 
  on info.`celldb_label_id` = label.`celldb_label_id`              
group by cell_name');

PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

答案 1 :(得分:1)

如果您知道标签的数量,则可以“转动”数据,以便行成为标签。

select cell_name,
  max(case when info.celldb_label_id = 1 then information else NULL end) as LabelForInfo1,
  max(case when info.celldb_label_id = 2 then information else NULL end) as LabelForInfo2,
  max(case when info.celldb_label_id = 3 then information else NULL end) as LabelForInfo3,
  ..
from
 onco_celldb_cells as cell
 left join onco_celldb_information as info on cell.celldb_cell_id = info.celldb_cell_id
group by cell.celldb_cell_id, cell.cell_name
order by cell.celldb_cell_id asc;

如果标签的数量和名称未知 - 您可以根据onco_celldb_labels中的信息动态构建上述查询。因此,首先通过执行以下查询为上述查询生成“动态”列:

select concat(
  'max(case when info.celldb_label_id = ',
   convert(celldb_label_id,char),
   ' then information else NULL end) as `',
   label,
   '`,')
from celldb_labels

现在在一个字符串中加入所有返回的行,从主查询添加开始和结束并执行。这样你就有了动态标签。据我所知,这是在MySQL中转移表的唯一方法。

答案 2 :(得分:0)

这不是一个非常漂亮的解决方案,但是如果你只想要几个标签作为列,你可以指定哪些标签,这样的东西应该有效:

SELECT
    s1.cell_name AS cell_name,
    s2.information AS Species,
    s3.information AS Origin
    -- Keep adding selects here for more columns
FROM
    (SELECT distinct cell_name FROM onco_celldb_information) AS s1
    LEFT JOIN onco_celldb_information AS s2
        ON (s1.cell_name = s2.cell_name AND s2.label = 'Species')
    LEFT JOIN onco_celldb_information AS s3
        ON (s1.cell_name = s3.cell_name AND s3.label = 'Origin')
    -- Keep adding more joins here for further columns you want.