我正在使用一个带有基因组岛的表的mysql数据库,格式为:
+----+-------+----------+----------+-----------------------------------------------+
| id | chrom | start | end | line_string |
+----+-------+----------+----------+-----------------------------------------------+
| 1 | 1 | 36568608 | 36569851 | ?? ?o?A ?? ?p?A |
| 2 | 1 | 82313020 | 82313491 | ?? ????A ?? L??A |
+----+-------+----------+----------+-----------------------------------------------+
线串的格式为:GeomFromText('Linestring(chrom start, chrom end)')
“开始”和“结束”的数字是指基准位置
我目前正在使用以下命令在我的python脚本中选择Island与非Island区域:
SELECT 'Island' as Island FROM islands
WHERE MBRIntersects(GeomFromText('Linestring(%d %d, %d %d)'), line_string)
UNION ALL SELECT 'non-Island' LIMIT 1 % (Chr, Start, Chr, End)
但是,我想修改此查询,同时将岛屿海岸和隔水池定义为:
岛岸 - 来自岛屿的2,000个基地
岛架 - 来自岛屿的2,000至4,000个基地
答案 0 :(得分:1)
我通过使用:
解决了这个问题SELECT 'Island' as Island FROM methylation.islands FORCE INDEX (locations)
WHERE MBRIntersects(GeomFromText('Linestring(%d %d, %d %d)'), line_string)
UNION ALL SELECT 'Shore' FROM methylation.islands FORCE INDEX (locations)
WHERE MBRIntersects(GeomFromText('Linestring(%d %d, %d %d)'), line_string)
UNION ALL SELECT 'Shelf' FROM methylation.islands FORCE INDEX (locations)
WHERE MBRIntersects(GeomFromText('Linestring(%d %d, %d %d)'), line_string)
UNION ALL SELECT 'Other' LIMIT 1
% (Chr, Start, Chr, End, Chr, Start-2000, Chr, End+2000, Chr, Start-4000, Chr, End+4000)
通过这种方式,任何“岛屿”都被列为这样,接下来如果它是一个岛屿的+/- 2,000个碱基对,它被列为“岸”,接下来如果它是+/- 4,000个碱基对,它被列为“架”。最后,其他一切都被认为是“其他”。通过使用LIMIT 1
,只返回第一个找到的术语。