MySQL SUBSTRING使用查询的长度和位置值与检索后的编程语言子串

时间:2013-04-23 20:46:43

标签: mysql substring multiple-select-query

我正在访问一个chado结构化的mysql数据库。我搜索基因产物,对于这个例子,产品是双功能GDP-岩藻糖合成酶:GDP-4-脱氢-6-脱氧-D-甘露糖差向异构酶和GDP-4-脱氢-6-L-脱氧半乳糖还原酶& #34;

然后我可以使用JOIN语句来查找这个基因所在的汇编以及它的坐标是什么。下面的SQL语句是有效的,将返回程序集的序列(不仅仅是基因的序列),以及程序集中感兴趣的基因的起始和终止位置。

SELECT f.uniquename AS protein_accession, product.value AS protein_name, srcfeature.residues AS residue_sequence, srcassembly.name AS source_type, location.fmin AS location_min, location.fmax AS location_max, location.strand
FROM feature f
JOIN cvterm polypeptide ON f.type_id=polypeptide.cvterm_id
JOIN featureprop product ON f.feature_id=product.feature_id
JOIN cvterm productprop ON product.type_id=productprop.cvterm_id
JOIN featureloc location ON f.feature_id=location.feature_id
JOIN feature srcfeature ON location.srcfeature_id=srcfeature.feature_id
JOIN cvterm srcassembly ON srcfeature.type_id=srcassembly.cvterm_id
WHERE polypeptide.name = 'polypeptide'
AND productprop.name = 'gene_product_name'
AND product.value LIKE '%bifunctional GDP-fucose synthetase: GDP-4-dehydro-6-deoxy-D-mannose epimerase and GDP-4-dehydro-6-L-deoxygalactose reductase%';

装配顺序非常长,我绝对不需要全部。使用MySQL的SUBSTRING方法提取我需要的部分以保存检索整个序列,或者在检索后使用编程语言的子串方法是否更好?下面的查询是我在SUBSTRING方法中尝试使用在查询位置和长度期间获得的值。它不起作用,我的猜测是它需要多个SELECT语句才能工作。 SQL变得非常丑陋,我甚至不确定工作最终结果会更好。

您有什么想法,使用SQL SUBSTRING做这个更好,或者只是使用编程语言和子串方法来显示我想要的内容,即使我已经检索了整个内容? < / p>

SELECT f.uniquename AS protein_accession, product.value AS protein_name, SUBSTRING(srcfeature.residues AS residue_sequence, location_min, location_max - location_min), srcassembly.name AS source_type, location.fmin AS location_min, location.fmax AS location_max, location.strand
FROM feature f
JOIN cvterm polypeptide ON f.type_id=polypeptide.cvterm_id
JOIN featureprop product ON f.feature_id=product.feature_id
JOIN cvterm productprop ON product.type_id=productprop.cvterm_id
JOIN featureloc location ON f.feature_id=location.feature_id
JOIN feature srcfeature ON location.srcfeature_id=srcfeature.feature_id
JOIN cvterm srcassembly ON srcfeature.type_id=srcassembly.cvterm_id
WHERE polypeptide.name = 'polypeptide'
AND productprop.name = 'gene_product_name'
AND product.value LIKE '%bifunctional GDP-fucose synthetase: GDP-4-dehydro-6-deoxy-D-mannose epimerase and GDP-4-dehydro-6-L-deoxygalactose reductase%';

修改 这是不同基因(较短名称)的示例结果。我省略了查询序列中的部分,因为该部分长达数千个字符。我必须使用此处显示的location_min和location_max的值正确地进行SUBSTRING。

+-------------------+---------------------------------------------------+-------------+--------------+--------------+--------+
| protein_accession | protein_name                                      | source_type | location_min | location_max | strand |
+-------------------+---------------------------------------------------+-------------+--------------+--------------+--------+
| ECDH10B_0026      | bifunctional riboflavin kinase and FAD synthetase | assembly    |        21406 |        22348 |      1 |
+-------------------+---------------------------------------------------+-------------+--------------+--------------+--------+

2 个答案:

答案 0 :(得分:1)

您的as位置错误。它需要追踪substring()的结束点:

SELECT f.uniquename AS protein_accession, product.value AS protein_name,
       SUBSTRING(srcfeature.residues, location_min, location_max - location_min)  AS residue_sequence,
       srcassembly.name AS source_type, location.fmin AS location_min, location.fmax AS location_max, location.strand
FROM feature f
JOIN cvterm polypeptide ON f.type_id=polypeptide.cvterm_id
JOIN featureprop product ON f.feature_id=product.feature_id
JOIN cvterm productprop ON product.type_id=productprop.cvterm_id
JOIN featureloc location ON f.feature_id=location.feature_id
JOIN feature srcfeature ON location.srcfeature_id=srcfeature.feature_id
JOIN cvterm srcassembly ON srcfeature.type_id=srcassembly.cvterm_id
WHERE polypeptide.name = 'polypeptide'
AND productprop.name = 'gene_product_name'
AND product.value LIKE '%bifunctional GDP-fucose synthetase: GDP-4-dehydro-6-deoxy-D-mannose epimerase and GDP-4-dehydro-6-L-deoxygalactose reductase%';

至于你的另一个问题,我认为在查询中提取你想要的数据更有意义,而不是将不必要的数据传回给应用程序。这节省了通信开销。此外,如果数据库使用多个线程/处理器,则数据库有机会并行运行。

答案 1 :(得分:0)

如果这样的事情适合你:

SELECT f.uniquename AS protein_accession, 
       product.value AS protein_name, 
       SUBSTRING(
                   srcfeature.residues, 
                   patindex('%SOMPATTERN%',srcfeature.residues), 
                   LEN(srcfeature.residues) - patindex('%SOMPATTERN%',srcfeature.residues)
                ) AS residue_sequence, 
      srcassembly.name AS source_type, 

然后在SQL中尝试。如果没有,请使用应用程序编程语言。