首先,我是数据库,编程语言等的新手......很抱歉,如果这个问题不是那么正确也不具体,那么任何帮助或指导都会非常感激......
我正在使用的上下文是:我通过其API查询现有数据库,以便检索某些信息来设计我自己的数据库。
创建这个数据库的目的是让用户引入一个基因来了解生物体在何处(UP)或在(DOWN)表达,以及在哪个实验中已经看到了这种表达。
目前,我正在做的只是查询现有数据库并解析json结果以获得每个生物体部分,所有过度或低表达的基因(以及我获得的每个基因)已报告该类型表达的实验)
(在脑中)
基因1
Experiment1 UP
Experiment2 UP
Experiment3 UP
Experiment4 DOWN
基因2
Experiment5 DOWN
Experiment2 DOWN
Experiment3 DOWN
Experiment8 UP
Experiment9 DOWN
我认为我需要的不同表格是:“基因”,“器官”,“实验”和“表达类型”(和“genes2experiments2organs”)
考虑到一个基因可以在一个以上的生物体中表达,并且可以具有与一个以上实验相关的不同类型的表达,并且一个实验可以包含多个基因(多对多关系)
我首先想知道的是如何添加关系数据并知道我的尝试是否朝着正确的方向发展,还是应该改变数据库的架构/想法......
我的第一次尝试就是:
###########################################
DATABASE DEFINITION
###########################################
from sqlalchemy import create_engine, Column, Integer, String, Date, ForeignKey, Table, Float
from sqlalchemy.orm import sessionmaker, relationship, backref
from sqlalchemy.ext.declarative import declarative_base
import requests
Base = declarative_base()
Genes2experiments2organs = Table('genes2experiments2organs',Base.metadata,
Column('gene_id', String, ForeignKey('genes.id')),
Column('experiment_id', String, ForeignKey('experiments.id')),
Column('organ_id', String, ForeignKey('organs.id'))
)
class Genes(Base):
__tablename__ = 'genes'
id = Column(String(45), primary_key=True)
def __init__(self, id=""):
self.id= id
def __repr__(self):
return "<genes(id:'%s')>" % (self.id)
class Experiments(Base):
__tablename__ = 'experiments'
id = Column(String(45), primary_key=True)
experiments = relationship("Experiments", secondary=Genes2experiments2organs, backref="genes")
organs = relationship("Organs", secondary=Genes2experiments2organs, backref="genes")
def __init__(self, id=""):
self.id= id
def __repr__(self):
return "<experiments(id:'%s')>" % (self.id)
class Organs(Base):
__tablename__ = 'organs'
id = Column(String(45), primary_key=True)
def __init__(self, id=""):
self.id= id
def __repr__(self):
return "<organs(id:'%s')>" % (self.id)
class Expression_type(Base):
__tablename__ = 'expression_type'
id = Column(String(45), primary_key=True)
def __init__(self, id=""):
self.id= id
def __repr__(self):
return "<expression_type(id:'%s')>" % (self.id)
#####################################################
INSERTING DATA
#####################################################
def setUp():
global Session
engine=create_engine('mysql://root:password@localhost/db_name?charset=utf8', pool_recycle=3600,echo=False)
Session=sessionmaker(bind=engine)
def add_data(): ## I am just adding genes without taking into account the other related data to these genes.....
session=Session()
for i in range(0,1000,200):
request= requests.get('http://www.ebi.ac.uk/gxa/api/v1',params={"updownInOrganism_part":"brain","rows":200,"start":i})
result = request.json
for item in result['results']:
gene_to_add = item['gene']['ensemblGeneId']
session.commit()
session.close()
setUp()
add_data()
session=Session()
genes=session.query(Genes).all()
print "List of genes introduced:"
for gene in genes:
print gene.id
session.close()
所以,使用这段代码我只填充“基因”表,但不考虑与其他数据存在的关系,我将不得不包含在数据库中...这样做的过程是什么,添加关系数据?还有一种避免插入重复基因的方法,例如在通过API查询填充表时?
顺便说一句,正如你所看到的,我并没有把所有多对多关系(次要的)放在“基因”表中,因为我不确定我是对还是完全错了...谢谢
答案 0 :(得分:1)
这应该做你想要的......
from sqlalchemy import (Column, create_engine, Integer, ForeignKey, Unicode,
Enum)
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker, relationship
Base = declarative_base()
class Gene(Base):
__tablename__ = 'gene'
id = Column(Integer, primary_key=True)
name = Column(Unicode(64), unique=True)
def __init__(self, name):
self.name = name
class Experiment(Base):
__tablename__ = 'experiment'
id = Column(Integer, primary_key=True)
class Organ(Base):
__tablename__ = 'organ'
id = Column(Integer, primary_key=True)
name = Column(Unicode(64), unique=True)
def __init__(self, name):
self.name = name
class Measurement(Base):
__tablename__ = 'measurement'
id = Column(Integer, primary_key=True)
experiment_id = Column(Integer, ForeignKey(Experiment.id))
gene_id = Column(Integer, ForeignKey(Gene.id))
organ_id = Column(Integer, ForeignKey(Organ.id))
# Add your measured values here
expression = Column(Enum('UP', 'DOWN'))
# ...
experiment = relationship(Experiment, backref='measurements')
gene = relationship(Gene, backref='measurements')
organ = relationship(Organ, backref='measurements')
def __repr__(self):
return 'Experiment %d: %s, %s, %s' % (self.experiment.id,
self.gene.name, self.organ.name, self.expression)
if __name__ == '__main__':
engine = create_engine('sqlite://')
session = sessionmaker(engine)()
Base.metadata.create_all(engine)
#
# Creating the data
#
x = Gene('Gene X')
y = Gene('Gene Y')
z = Gene('Gene Z')
heart = Organ('Heart')
lungs = Organ('Lungs')
brain = Organ('Brain')
session.add_all([x, y, z, heart, lungs, brain])
session.commit()
experiment_1 = Experiment()
experiment_1.measurements.extend(
[Measurement(gene_id=x.id, organ_id=heart.id, expression='UP'),
Measurement(gene_id=x.id, organ_id=lungs.id, expression='UP'),
Measurement(gene_id=x.id, organ_id=brain.id, expression='DOWN'),
Measurement(gene_id=y.id, organ_id=brain.id, expression='UP'),
Measurement(gene_id=z.id, organ_id=brain.id, expression='DOWN')])
experiment_2 = Experiment()
experiment_2.measurements.extend(
[Measurement(gene_id=y.id, organ_id=lungs.id, expression='UP'),
Measurement(gene_id=y.id, organ_id=lungs.id, expression='UP'),
Measurement(gene_id=y.id, organ_id=brain.id, expression='UP'),
Measurement(gene_id=x.id, organ_id=brain.id, expression='UP'),
Measurement(gene_id=z.id, organ_id=heart.id, expression='UP')])
session.add_all([experiment_1, experiment_2])
session.commit()
#
# Querying the data
#
print('All measurements in the first experiment')
experiment = session.query(Experiment).filter(Experiment.id == 1).one()
for measurement in experiment.measurements:
print(measurement)
print('')
print('All measurements of Gene X')
gene_x = session.query(Gene).filter(Gene.name == 'Gene X').one()
for measurement in gene_x.measurements:
print(measurement)
print('')
print('All measurements of the brain')
the_brain = session.query(Organ).filter(Organ.name == 'Brain').one()
for measurement in the_brain.measurements:
print(measurement)
print('')