MySQL将数据库列从Blob转换为单独的部分

时间:2013-09-11 19:28:01

标签: mysql sql database migration

我正在尝试从一个旧的MySQL表中取出一个blob并为它创建一个新表,以便达到第一个正常形式。但是,要将数据库中已有的数据从blob转换为新表中的多行,结果并非易事。

使用SQL命令实现转换的最简单方法是什么?

父表:

CREATE  TABLE TEST.People (
  `id` INT  AUTO_INCREMENT,
  `age` INT,
  `height` INT,
  `weight` INT  ,
  `variations` BLOB DEFAULT NULL,
  PRIMARY KEY (`id`), 
);

新表:

CREATE  TABLE TEST.Variations (
  `id` INT  AUTO_INCREMENT,
  `chr` INT,
  `start` INT,
  `stop` INT  ,
  `type` ENUM('SNP','INDEL','CNV') DEFAULT NULL,
  PRIMARY KEY (`id`), 
);

当我运行SELECT id时,变种FROM TEST.People; 我明白了:

+----+----------------------------------------------------------------------------------------------------------------------+
| id | variations                                                                                                           |
+----+----------------------------------------------------------------------------------------------------------------------+
|  3 | xp   t !3:124093754-124467278/CNVt 7:78030601-79638023/CNV                                                           |
|  6 | xp                                                                                                                   |
|  9 | xp                                                                                                                   |
| 12 | xp   t !1:84289718-85466763/CNV                                                                                      |
| 15 | xp                                                                                                                   |
| 18 | xp                                                                                                                   |
| 21 | xp                                                                                                                   |
| 24 | xp                                                                                                                   |
| 27 | xp                                                                                                                   |
| 30 | xp   t !10:166909544-166909544/SNPt !2:66903445-66903445/SNPt !2:166897864-166897864/CNVt !7:6892788-6892788/SNP     |
+----+----------------------------------------------------------------------------------------------------------------------+

所以我希望转换后的TEST.Variations表是这样的:

+----+-----+-----------+-----------+----------+
| id | chr | start     | stop      | type     |  
+----+-----+-----------+-----------+----------+
|  3 |   3 | 124093754 | 124467278 | CNV      |
|  3 |   7 |  78030601 |  79638023 | CNV      |
| 12 |   1 |  84289718 |  85466763 | CNV      |
| 30 |  10 | 166909544 | 166909544 | SNP      |
| 30 |   2 |  66903445 |  66903445 | SNP      |
| 30 |   2 | 166897864 | 166897864 | CNV      |
| 30 |   7 |   6892788 |   6892788 | SNP      |
+----+-----+-----------+-----------+----------+

1 个答案:

答案 0 :(得分:1)

首先两件事:

  1. 您的ID数据不一致3. !之前没有7:...。我希望这只是一个错字

    xp   t !3:124093754-124467278/CNVt 7:78030601-79638023/CNV
                                      ^^
    
  2. 如果您希望在目标表中有一个auto_increment列,那么您的架构应该看起来像这样

    CREATE  TABLE variations 
    (
      `var_id` INT NOT NULL AUTO_INCREMENT,
      `id`    INT, -- id from People goes here and it's not UNIQUE
      `chr`   INT,
      `start` INT,
      `stop`  INT ,
      `type`  ENUM('SNP','INDEL','CNV') DEFAULT NULL,
      PRIMARY KEY (`var_id`) 
    );
    
  3. 现在您可以使用查询将数据从People传输到Variations

    INSERT INTO variations (id, chr, start, stop, type)
    SELECT id, 
           SUBSTRING_INDEX(variation, ':', 1) chr,
           SUBSTRING_INDEX(SUBSTRING_INDEX(variation, '-', 1), ':', -1) start,
           SUBSTRING_INDEX(SUBSTRING_INDEX(variation, '-', -1), '/', 1) stop,
           SUBSTRING_INDEX(variation, '/', -1) type
      FROM
    (
      SELECT p.id, SUBSTRING_INDEX(SUBSTRING_INDEX(p.variations, 't !', n.n), 't !', -1) variation
        FROM 
      (
        SELECT id, SUBSTR(variations, 9) variations
          FROM people 
         WHERE variations LIKE 'xp   t !%'
      ) p CROSS JOIN 
      (
         SELECT a.N + b.N * 10 + 1 n
           FROM 
          (SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
         ,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
          ORDER BY n
      ) n
       WHERE n.n <= 1 + (LENGTH(p.variations) - LENGTH(REPLACE(p.variations, 't !', ''))) / 3
       ORDER BY id
    ) q
     ORDER BY id, chr, start, stop, type;
    

    注意:此查询会将每个ID最多分为100个。如果您需要更多或更少,您可以通过使用n别名编辑内部子查询来调整限制,该别名会动态生成数字(计数)表。

    结果:

    | VAR_ID | ID | CHR |     START |      STOP | TYPE |
    |--------|----|-----|-----------|-----------|------|
    |      1 |  3 |   3 | 124093754 | 124467278 |  CNV |
    |      2 |  3 |   7 |  78030601 |  79638023 |  CNV |
    |      3 | 12 |   1 |  84289718 |  85466763 |  CNV |
    |      4 | 30 |  10 | 166909544 | 166909544 |  SNP |
    |      5 | 30 |   2 | 166897864 | 166897864 |  CNV |
    |      6 | 30 |   2 |  66903445 |  66903445 |  SNP |
    |      7 | 30 |   7 |   6892788 |   6892788 |  SNP |
    

    这是 SQLFiddle 演示