我正在访问一个chado结构化的mysql数据库。我搜索基因产物,对于这个例子,产品是双功能GDP-岩藻糖合成酶:GDP-4-脱氢-6-脱氧-D-甘露糖差向异构酶和GDP-4-脱氢-6-L-脱氧半乳糖还原酶& #34;
然后我可以使用JOIN语句来查找这个基因所在的汇编以及它的坐标是什么。下面的SQL语句是有效的,将返回程序集的序列(不仅仅是基因的序列),以及程序集中感兴趣的基因的起始和终止位置。
SELECT f.uniquename AS protein_accession, product.value AS protein_name, srcfeature.residues AS residue_sequence, srcassembly.name AS source_type, location.fmin AS location_min, location.fmax AS location_max, location.strand
FROM feature f
JOIN cvterm polypeptide ON f.type_id=polypeptide.cvterm_id
JOIN featureprop product ON f.feature_id=product.feature_id
JOIN cvterm productprop ON product.type_id=productprop.cvterm_id
JOIN featureloc location ON f.feature_id=location.feature_id
JOIN feature srcfeature ON location.srcfeature_id=srcfeature.feature_id
JOIN cvterm srcassembly ON srcfeature.type_id=srcassembly.cvterm_id
WHERE polypeptide.name = 'polypeptide'
AND productprop.name = 'gene_product_name'
AND product.value LIKE '%bifunctional GDP-fucose synthetase: GDP-4-dehydro-6-deoxy-D-mannose epimerase and GDP-4-dehydro-6-L-deoxygalactose reductase%';
装配顺序非常长,我绝对不需要全部。使用MySQL的SUBSTRING方法提取我需要的部分以保存检索整个序列,或者在检索后使用编程语言的子串方法是否更好?下面的查询是我在SUBSTRING方法中尝试使用在查询位置和长度期间获得的值。它不起作用,我的猜测是它需要多个SELECT语句才能工作。 SQL变得非常丑陋,我甚至不确定工作最终结果会更好。
您有什么想法,使用SQL SUBSTRING做这个更好,或者只是使用编程语言和子串方法来显示我想要的内容,即使我已经检索了整个内容? < / p>
SELECT f.uniquename AS protein_accession, product.value AS protein_name, SUBSTRING(srcfeature.residues AS residue_sequence, location_min, location_max - location_min), srcassembly.name AS source_type, location.fmin AS location_min, location.fmax AS location_max, location.strand
FROM feature f
JOIN cvterm polypeptide ON f.type_id=polypeptide.cvterm_id
JOIN featureprop product ON f.feature_id=product.feature_id
JOIN cvterm productprop ON product.type_id=productprop.cvterm_id
JOIN featureloc location ON f.feature_id=location.feature_id
JOIN feature srcfeature ON location.srcfeature_id=srcfeature.feature_id
JOIN cvterm srcassembly ON srcfeature.type_id=srcassembly.cvterm_id
WHERE polypeptide.name = 'polypeptide'
AND productprop.name = 'gene_product_name'
AND product.value LIKE '%bifunctional GDP-fucose synthetase: GDP-4-dehydro-6-deoxy-D-mannose epimerase and GDP-4-dehydro-6-L-deoxygalactose reductase%';
修改 这是不同基因(较短名称)的示例结果。我省略了查询序列中的部分,因为该部分长达数千个字符。我必须使用此处显示的location_min和location_max的值正确地进行SUBSTRING。
+-------------------+---------------------------------------------------+-------------+--------------+--------------+--------+
| protein_accession | protein_name | source_type | location_min | location_max | strand |
+-------------------+---------------------------------------------------+-------------+--------------+--------------+--------+
| ECDH10B_0026 | bifunctional riboflavin kinase and FAD synthetase | assembly | 21406 | 22348 | 1 |
+-------------------+---------------------------------------------------+-------------+--------------+--------------+--------+
答案 0 :(得分:1)
您的as
位置错误。它需要追踪substring()
的结束点:
SELECT f.uniquename AS protein_accession, product.value AS protein_name,
SUBSTRING(srcfeature.residues, location_min, location_max - location_min) AS residue_sequence,
srcassembly.name AS source_type, location.fmin AS location_min, location.fmax AS location_max, location.strand
FROM feature f
JOIN cvterm polypeptide ON f.type_id=polypeptide.cvterm_id
JOIN featureprop product ON f.feature_id=product.feature_id
JOIN cvterm productprop ON product.type_id=productprop.cvterm_id
JOIN featureloc location ON f.feature_id=location.feature_id
JOIN feature srcfeature ON location.srcfeature_id=srcfeature.feature_id
JOIN cvterm srcassembly ON srcfeature.type_id=srcassembly.cvterm_id
WHERE polypeptide.name = 'polypeptide'
AND productprop.name = 'gene_product_name'
AND product.value LIKE '%bifunctional GDP-fucose synthetase: GDP-4-dehydro-6-deoxy-D-mannose epimerase and GDP-4-dehydro-6-L-deoxygalactose reductase%';
至于你的另一个问题,我认为在查询中提取你想要的数据更有意义,而不是将不必要的数据传回给应用程序。这节省了通信开销。此外,如果数据库使用多个线程/处理器,则数据库有机会并行运行。
答案 1 :(得分:0)
如果这样的事情适合你:
SELECT f.uniquename AS protein_accession,
product.value AS protein_name,
SUBSTRING(
srcfeature.residues,
patindex('%SOMPATTERN%',srcfeature.residues),
LEN(srcfeature.residues) - patindex('%SOMPATTERN%',srcfeature.residues)
) AS residue_sequence,
srcassembly.name AS source_type,
然后在SQL中尝试。如果没有,请使用应用程序编程语言。