Question

我正在使用rdkit一个cheminformatics工具包，它提供了一个postgresql盒，允许存储化学分子。我想创建一个django模型如下：

from rdkit.Chem import Mol

class compound(models.Model):
    internal = models.CharField(max_length=10 ,db_index=True)
    external = models.CharField(max_length=15,db_index=True)
    smiles   = models.TextField()
    # This is my proposed custom "mol" type defined by rdkit cartridge and that probably maps
    # to the Mol object imported from rdkit.Chem
    rdkit_mol = models.MyCustomMolField()

所以＆＃34; rdkit_mol＆＃34;我想映射到rdkit postgres数据库catridge类型＆＃34; mol＆＃34;。在SQL中＆＃34; mol＆＃34;列是从＆＃34;微笑＆＃34;创建的。字符串使用

之类的语法

postgres@compounds=# insert into compound (smiles,rdkit_mol,internal,external)  VALUES ('C1=CC=C[N]1',mol_from_smiles('C1=CC=C[N]1'), 'MYID-111111', 'E-2222222');

这些称为＆＃34; mol_from_smiles＆＃34;由墨盒定义的数据库函数来创建mol对象。

我是否应该让数据库在保存期间处理此列的创建。我可以在postgres中定义一个自定义TRIGGER，它运行mol_from_smiles函数来填充rdkit_mol列。

我还希望能够使用返回django模型的mol自定义功能执行查询。例如，我可以将一个SQL查询返回给我化学上看起来像这样的复合模型。目前在SQL中我做

select * from compound where rdkit_mol @> 'C1=CC=C[N]1';

然后基本上返回化学品＆＃34;化合物＆＃34;对象。

我的问题是：鉴于我的领域的自定义性质。有没有办法混合和匹配数据库的功能＆＃34; mol＆＃34;用django复合模型打字？有什么方法可以实现这一目标。

目前我倾向于不使用Django ORM，只使用原始SQL来往返数据库。我想知道是否有一种使用这种自定义类型的django方式。

在我目前的混合方法中，我的观点看起来像这样。

def get_similar_compounds(request):
    # code to get the raw smiles string for eg 'C1=CC=C[N]1' from a form
    db_cursor.execute("select internal from compound where rdkit_mol @> 'C1=CC=C[N]1';")
    # code to get internal ids from database cursor
    similar_compounds = compound.objects.filter(internal__in = ids_from_query_above)
    # Then process queryset

这种混合方法是否可取或是否有更pythonic / django方式来处理这种自定义数据类型。

Answer 1

混合方式是提供自定义字段实现 - 您已经在做什么。没有更多的东西。

自定义字段quite extensive protocol用于自定义其行为。您可以自定义在将值发送到数据库之前发生的事情，收到数据时会发生什么，使用特定查找（例如mol__in=sth）时会发生什么。

在当前的开发版本中，Django允许提供自定义查找类型，因此您甚至可以实现@>运算符（尽管我建议坚持使用官方稳定版本）。

最终它取决于你更容易的事情。提供良好，一致的MolField实施可能会耗费时间。因此，这取决于你需要多少个地方。在这几个地方使用原始SQL可能更实用。

Answer 2

我的问题主要是处理创建django自定义字段以处理postgres rdkit数据盒式磁带定义的“mol”数据类型的机制。

我制定的解决方案包括一个与我的模型共存的自定义字段，然后使用原始SQL来运行针对mol类型的查询。

因为每次实例化包含模型实例的SMILES时，我都需要创建一个rdkit“mol”类型，我创建了一个数据库过程和一个在表插入或更新时触发的触发器。

# A south migration that defines a function called write_rdkit_mol_south in PL/PGSQL

from south.utils import datetime_utils as datetime
from south.db import db
from south.v2 import DataMigration
from django.db import models


class Migration(DataMigration):
    def forwards(self, orm):
        "Write your forwards methods here."
        db.execute("""create function write_rdkit_mol_south() RETURNS trigger as $write_rdkit_mol_south$
BEGIN
NEW.rdkit_mol := mol_from_smiles(NEW.smiles::cstring);
RETURN NEW;
END;
$write_rdkit_mol_south$ LANGUAGE plpgsql;""")
        db.execute(
            "create TRIGGER write_rdkit_mol_trig BEFORE INSERT OR UPDATE on strucinfo_compound  FOR EACH ROW EXECUTE PROCEDURE write_rdkit_mol_south();")

        # Note: Don't use "from appname.models import ModelName".
        # Use orm.ModelName to refer to models in this application,
        # and orm['appname.ModelName'] for models in other applications.

    def backwards(self, orm):
        "Write your backwards methods here."
        db.execute("drop TRIGGER write_rdkit_mol_trig ON strucinfo_compound;")
        db.execute("DROP FUNCTION write_rdkit_mol_south();")

接下来，我创建了自定义字段和模型。

# My Django model:
class compound(models.Model):
    internalid = models.CharField(max_length=10 ,db_index=True)
    externalid = models.CharField(max_length=15,db_index=True)
    smiles = models.TextField()
    rdkit_mol = RdkitMolField()

    def save(self,*args,**kwargs):
        self.rdkit_mol = ""
        super(compound,self).save(*args,**kwargs)

# The custom field


class RdkitMolField(models.Field):

    description = "Rdkit molecule field"

    def __init__(self,*args,**kwds):
        super(RdkitMolField,self).__init__(*args,**kwds)

    def db_type(self, connection):
        if connection.settings_dict['ENGINE'] == 'django.db.backends.postgresql_psycopg2':
            return None
        else:
            raise DatabaseError('Field type only supported for Postgres with rdkit cartridge')


    def to_python(self, value):
        if isinstance(value,Chem.Mol):
            return value
        if isinstance(value,basestring):
            # The database normally returns the smiles string
            return Chem.MolFromSmiles(str(value))
        else:
            if value:
            #if mol_send was used then we will have a pickled object
                return Chem.Mol(str(value))
            else:
            # The None Case
                return "NO MOL"
   def get_prep_value(self, value):
        # This gets called during save
        # the method should return data in a format that has been prepared for use as a parameter in a query : say the docs
        # rdkit_mol queries do accept smiles strings

        if isinstance(value,basestring):
            db_smiles = str(value)
            if db_smiles:
                my_mol = Chem.MolFromSmiles(db_smiles)
            else:
                return None
            if my_mol:
                # Roundtrip with object could be avoided
                return str(Chem.MolToSmiles(my_mol))
        elif isinstance(value,(str,unicode)):
            valid_smiles = str(Chem.MolToSmiles(Chem.MolFromSmiles(str(value))))
            if valid_smiles:
                return valid_smiles
            else:
                # This is the None case
                # The database trigger will handle this as this should happen only during insert or update
                return None

    def validate(self, value, model_instance):
        # This field is handled by database trigger so we do not want it to be used for object initiation
        if value is None:
            return
        else:
            super(RdkitMolField,self).validate(value,model_instance)

如何处理/映射自定义postgresql类型到django模型

2 个答案: