复杂的SQL语句对我来说越来越好

时间:2013-02-15 09:21:51

标签: mysql

我知道我一定是个疯狂的东西但是我试图用一个相当复杂的陈述(至少对我来说)查询数据库而且我得到的行数超出了我的预期,有人知道怎么'修复'这个?

我正在查询的表格如下:

glycoPeptide | CREATE TABLE `glycoPeptide` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `protein` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=latin1 |

run   | CREATE TABLE `run` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `glycoPeptide` int(11) NOT NULL,
  `run` enum('spectrum','chromatogram') NOT NULL,
  `glycoType` enum('N','O') DEFAULT NULL,
  `glycoSite` int(11) DEFAULT NULL,
  `pepMass` varchar(5) DEFAULT NULL,
  `pepSeq` varchar(50) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `glycoPeptide` (`glycoPeptide`),
  CONSTRAINT `run_ibfk_1` FOREIGN KEY (`glycoPeptide`) REFERENCES `glycoPeptide` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=latin1 |

spectrum | CREATE TABLE `spectrum` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `run` int(11) NOT NULL,
  `glycoform` varchar(255) DEFAULT NULL,
  `spectrum` enum('m/z','intensity') NOT NULL,
  PRIMARY KEY (`id`),
  KEY `run` (`run`),
  CONSTRAINT `spectrum_ibfk_1` FOREIGN KEY (`run`) REFERENCES `run` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=29 DEFAULT CHARSET=latin1 |

precursor | CREATE TABLE `precursor` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `run` int(11) NOT NULL,
  `retentionTime` time DEFAULT NULL,
  `mzValue` float DEFAULT NULL,
  `chargeState` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `run` (`run`),
  CONSTRAINT `precursor_ibfk_1` FOREIGN KEY (`run`) REFERENCES `run` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=15 DEFAULT CHARSET=latin1 |

binaryDataArray | CREATE TABLE `binaryDataArray` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `spectrum` int(11) NOT NULL,
  `arrayLength` int(11) NOT NULL,
  `EncodedLength` int(11) NOT NULL,
  `arrayData` text,
  PRIMARY KEY (`id`),
  KEY `spectrum` (`spectrum`),
  CONSTRAINT `binaryDataArray_ibfk_1` FOREIGN KEY (`spectrum`) REFERENCES `spectrum` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=29 DEFAULT CHARSET=latin1 |

我有一些测试数据用于2种蛋白质(IgG和IgE)。 IgG仅包含1次运行,仅含有1个糖苷,因此只有1'组'的binaryDataArrays。 IgE含有3个糖苷,因此有3次运行,每次运行可以包含多个光谱(每个一组2个binaryDataArray)。

我使用以下查询(使用JOINS会更漂亮,我知道):

select
  precursor.mzValue,
  glycoPeptide.protein,
  binaryDataArray.arrayLength,
  binaryDataArray.encodedLength,
  precursor.chargeState,
  run.pepMass,
  run.PepSeq
from
  precursor,
  glycoPeptide,
  binaryDataArray,
  spectrum,
  run
where
  run.glycoPeptide = glycoPeptide.id AND
  spectrum.run = run.id AND
  precursor.run = run.id AND
  binaryDataArray.spectrum = spectrum.id AND
  spectrum.spectrum like 'm/z' AND
  precursor.mzValue like '1196.79' AND
  glycoPeptide.protein like 'IgE' AND
  run.glycoSite like '252' AND
  run.glycoType like 'N';

产生IgG的结果就像我预期的那样:

+---------+---------+-------------+---------------+-------------+---------+-----------+
| mzValue | protein | arrayLength | encodedLength | chargeState | pepMass | PepSeq    |
+---------+---------+-------------+---------------+-------------+---------+-----------+
|   933.4 | IgG     |       10301 |         22912 |           3 | 1189.   | EEQYNSTYR |
+---------+---------+-------------+---------------+-------------+---------+-----------+
1 row in set (0.00 sec)

对于IgE(使用上面的陈述),我得到以下结果:

+---------+---------+-------------+---------------+-------------+---------+-----------+
| mzValue | protein | arrayLength | encodedLength | chargeState | pepMass | PepSeq    |
+---------+---------+-------------+---------------+-------------+---------+-----------+
| 1196.79 | IgE     |       10301 |        109880 |           3 | 1033.   | GTVNLTWSR |
| 1196.79 | IgE     |       10301 |         54940 |           3 | 1033.   | GTVNLTWSR |
| 1196.79 | IgE     |       10301 |         54940 |           3 | 1033.   | GTVNLTWSR |
+---------+---------+-------------+---------------+-------------+---------+-----------+
3 rows in set (0.00 sec)

虽然我希望这里只有一排,但我似乎无法理解它。

非常感谢任何帮助

- 编辑1 -

我编写where子句的方式应该完全按照我所知的方式进行,所以不应该是问题......

- 编辑2 -

示例数据:

select * from glycoPeptide;
+----+---------+
| id | protein |
+----+---------+
|  1 | IgG     |
|  2 | IgE     |
+----+---------+
2 rows in set (0.00 sec)

mysql> select * from run;
+----+--------------+----------+-----------+-----------+---------+-----------------+
| id | glycoPeptide | run      | glycoType | glycoSite | pepMass | pepSeq          |
+----+--------------+----------+-----------+-----------+---------+-----------------+
|  1 |            1 | spectrum | N         |       297 | 1189.   | EEQYNSTYR       |
|  2 |            2 | spectrum | N         |       275 | 1516.   | NGTLTVTSTLPVGTR |
|  3 |            2 | spectrum | N         |       252 | 1033.   | GTVNLTWSR       |
|  4 |            2 | spectrum | N         |        99 | 1556.   | VAHTPSSTDWVDNK  |
+----+--------------+----------+-----------+-----------+---------+-----------------+
4 rows in set (0.00 sec)

select * from precursor;
+----+-----+---------------+---------+-------------+
| id | run | retentionTime | mzValue | chargeState |
+----+-----+---------------+---------+-------------+
|  1 |   1 | 00:13:32      |   933.4 |           3 |
|  2 |   2 | 00:00:00      |  965.55 |           2 |
|  3 |   2 | 00:00:00      | 912.036 |           2 |
|  4 |   2 | 00:00:00      | 1127.06 |           3 |
|  5 |   3 | 00:00:00      | 1099.97 |           2 |
|  6 |   3 | 00:00:00      |  1153.9 |           3 |
|  7 |   3 | 00:00:00      | 1196.79 |           3 |
|  8 |   4 | 00:00:00      |  1109.5 |           2 |
|  9 |   4 | 00:00:00      | 1157.66 |           2 |
| 10 |   4 | 00:00:00      | 1225.66 |           2 |
| 11 |   4 | 00:00:00      | 1206.47 |           3 |
| 12 |   4 | 00:00:00      | 1328.31 |           3 |
| 13 |   4 | 00:00:00      | 1304.09 |           3 |
| 14 |   4 | 00:00:00      | 1165.04 |           2 |
+----+-----+---------------+---------+-------------+
14 rows in set (0.00 sec)

mysql> select * from spectrum;
+----+-----+-----------+-----------+
| id | run | glycoform | spectrum  |
+----+-----+-----------+-----------+
|  1 |   1 | G1F       | m/z       |
|  2 |   1 | G1F       | intensity |
|  3 |   2 | NULL      | m/z       |
|  4 |   2 | NULL      | intensity |
|  5 |   2 | NULL      | m/z       |
|  6 |   2 | NULL      | intensity |
|  7 |   2 | NULL      | m/z       |
|  8 |   2 | NULL      | intensity |
|  9 |   3 | NULL      | m/z       |
| 10 |   3 | NULL      | intensity |
| 11 |   3 | NULL      | m/z       |
| 12 |   3 | NULL      | intensity |
| 13 |   3 | NULL      | m/z       |
| 14 |   3 | NULL      | intensity |
| 15 |   4 | NULL      | m/z       |
| 16 |   4 | NULL      | intensity |
| 17 |   4 | NULL      | m/z       |
| 18 |   4 | NULL      | intensity |
| 19 |   4 | NULL      | m/z       |
| 20 |   4 | NULL      | intensity |
| 21 |   4 | NULL      | m/z       |
| 22 |   4 | NULL      | intensity |
| 23 |   4 | NULL      | m/z       |
| 24 |   4 | NULL      | intensity |
| 25 |   4 | NULL      | m/z       |
| 26 |   4 | NULL      | intensity |
| 27 |   4 | NULL      | m/z       |
| 28 |   4 | NULL      | intensity |
+----+-----+-----------+-----------+
28 rows in set (0.00 sec)

mysql> select id, spectrum, arrayLength, encodedLength from binaryDataArray;
+----+----------+-------------+---------------+
| id | spectrum | arrayLength | encodedLength |
+----+----------+-------------+---------------+
|  1 |        1 |       10301 |         22912 |
|  2 |        2 |       10301 |          3092 |
|  3 |        3 |       10301 |         54940 |
|  4 |        4 |       10301 |        109880 |
|  5 |        5 |       10301 |         54940 |
|  6 |        6 |       10301 |        109880 |
|  7 |        7 |       10301 |        102408 |
|  8 |        8 |       10301 |        109880 |
|  9 |        9 |       10301 |        109880 |
| 10 |       10 |       10301 |         54940 |
| 11 |       11 |       10301 |         54940 |
| 12 |       12 |       10301 |        109880 |
| 13 |       13 |       10301 |         54940 |
| 14 |       14 |       10301 |        109880 |
| 15 |       15 |       10301 |        109880 |
| 16 |       16 |       10301 |         54940 |
| 17 |       17 |       10301 |         54940 |
| 18 |       18 |       10301 |        109880 |
| 19 |       19 |       10301 |        109880 |
| 20 |       20 |       10301 |         54940 |
| 21 |       21 |       10301 |        109880 |
| 22 |       22 |       10301 |         54940 |
| 23 |       23 |       10301 |         54940 |
| 24 |       24 |       10301 |        109880 |
| 25 |       25 |       10301 |         54940 |
| 26 |       26 |       10301 |        109880 |
| 27 |       27 |       10301 |        109880 |
| 28 |       28 |       10301 |         54940 |
+----+----------+-------------+---------------+
28 rows in set (0.00 sec)

- 编辑3 -

当前所需的数据无法从数据库中收集,因为其中一个关系不存在(需要能够将光谱链接到前体)。我要感谢Radical和Jack先生帮助揭开这个漏洞并接受了Jack的回答,因为他在查询中加入的符号比我的做法更容易阅读。

2 个答案:

答案 0 :(得分:1)

首先,我会像这样重写你的查询;更容易看到连接条件是什么,并保持where子句清洁:

select
  precursor.mzValue,
  glycoPeptide.protein,
  binaryDataArray.arrayLength,
  binaryDataArray.encodedLength,
  precursor.chargeState,
  run.pepMass,
  run.PepSeq
from
  precursor
  inner join glycoPeptide on run.glycoPeptide = glycoPeptide.id
  inner join binaryDataArray on binaryDataArray.spectrum = spectrum.id
  inner join spectrum on spectrum.run = run.id
  inner join run on precursor.run = run.id
where
  spectrum.spectrum like 'm/z' AND
  precursor.mzValue like '1196.79' AND
  glycoPeptide.protein like 'IgE' AND
  run.glycoSite like '252' AND
  run.glycoType like 'N';

查询的问题在于spectrum表。来自run的联接产生三行,spectrum.id为9,11或13。

|  9 |   3 | NULL      | m/z       |
| 11 |   3 | NULL      | m/z       |
| 13 |   3 | NULL      | m/z       |

答案 1 :(得分:-1)

MySql JOIN会帮助你。你的问题太长了,所以我给你一个简单的JOIN实例

可以使用tbl_name AS alias_name或tbl_name alias_name别名表引用:

SELECT t1.name, t2.salary
  FROM employee AS t1 INNER JOIN info AS t2 ON t1.name = t2.name;

SELECT t1.name, t2.salary
  FROM employee t1 INNER JOIN info t2 ON t1.name = t2.name;

有关详细信息,请访问: http://www.w3schools.com/sql/sql_join.asp