在Django模型中使用Trigram(gin_trgm_ops)创建Gin索引

时间:2017-06-29 08:46:19

标签: python django postgresql indexing similarity

django.contrib.postgres的新TrigramSimilarity功能非常适合我遇到的问题。我将它用于搜索栏以找到难以拼写的拉丁名字。问题是有超过200万个名字,搜索时间比我想要的要长。

我喜欢在postgres文档中创建一个关于三元组的索引 https://www.postgresql.org/docs/9.6/static/pgtrgm.html

但我不知道如何以Django API使用它的方式来做到这一点。对于postgres文本搜索,有关于如何创建索引的描述。但不是因为三元组的相似性。 https://docs.djangoproject.com/en/1.11/ref/contrib/postgres/search/#performance

这就是我现在所拥有的:

+section

然后在vieuw的get_queryset中执行:

class NCBI_names(models.Model):
tax_id          =   models.ForeignKey(NCBI_nodes, on_delete=models.CASCADE, default = 0)
name_txt        =   models.CharField(max_length=255, default = '')
name_class      =   models.CharField(max_length=32, db_index=True, default = '')
class Meta:
    indexes = [GinIndex(fields=['name_txt'])]

编辑将整个视图类放在

5 个答案:

答案 0 :(得分:9)

我遇到了类似的问题,尝试使用pg_tgrm扩展名来支持有效的containsicontains Django字段查找。

可能有一种更优雅的方式,但定义像这样的新索引类型对我有用:

from django.contrib.postgres.indexes import GinIndex

class TrigramIndex(GinIndex):
    def get_sql_create_template_values(self, model, schema_editor, using):
        fields = [model._meta.get_field(field_name) for field_name, order in self.fields_orders]
        tablespace_sql = schema_editor._get_index_tablespace_sql(model, fields)
        quote_name = schema_editor.quote_name
        columns = [
            ('%s %s' % (quote_name(field.column), order)).strip() + ' gin_trgm_ops'
            for field, (field_name, order) in zip(fields, self.fields_orders)
        ]
        return {
            'table': quote_name(model._meta.db_table),
            'name': quote_name(self.name),
            'columns': ', '.join(columns),
            'using': using,
            'extra': tablespace_sql,
        }

方法get_sql_create_template_values是从Index.get_sql_create_template_values()复制而来的,只需进行一次修改:添加+ ' gin_trgm_ops'

对于您的用例,您可以使用此name_txt而不是TrigramIndexGinIndex上定义索引。然后运行makemigrations,这将生成一个生成所需CREATE INDEX SQL的迁移。

更新:

我看到你也在使用icontains进行查询:

result.exclude(name_txt__icontains = 'sp.')

Postgresql后端将把它变成这样的东西:

UPPER("NCBI_names"."name_txt"::text) LIKE UPPER('sp.')

然后由于UPPER()而不会使用trigram索引。

我遇到了同样的问题,最后继承了数据库后端以解决它:

from django.db.backends.postgresql import base, operations

class DatabaseFeatures(base.DatabaseFeatures):
    pass

class DatabaseOperations(operations.DatabaseOperations):
    def lookup_cast(self, lookup_type, internal_type=None):
        lookup = '%s'

        # Cast text lookups to text to allow things like filter(x__contains=4)
        if lookup_type in ('iexact', 'contains', 'icontains', 'startswith',
                           'istartswith', 'endswith', 'iendswith', 'regex', 'iregex'):
            if internal_type in ('IPAddressField', 'GenericIPAddressField'):
                lookup = "HOST(%s)"
            else:
                lookup = "%s::text"

        return lookup


class DatabaseWrapper(base.DatabaseWrapper):
    """
        Override the defaults where needed to allow use of trigram index
    """
    ops_class = DatabaseOperations

    def __init__(self, *args, **kwargs):
        self.operators.update({
            'icontains': 'ILIKE %s',
            'istartswith': 'ILIKE %s',
            'iendswith': 'ILIKE %s',
        })
        self.pattern_ops.update({
            'icontains': "ILIKE '%%' || {} || '%%'",
            'istartswith': "ILIKE {} || '%%'",
            'iendswith': "ILIKE '%%' || {}",
        })
        super(DatabaseWrapper, self).__init__(*args, **kwargs)

答案 1 :(得分:3)

old article的启发,我进入了current one,它为GistIndex提供了以下解决方案:

更新: 从Django-1.11开始,事情似乎变得更简单了,this answerdjango docs总结如下:

from django.contrib.postgres.indexes import GinIndex

class MyModel(models.Model):
    the_field = models.CharField(max_length=512, db_index=True)

    class Meta:
        indexes = [GinIndex(fields=['the_field'])]

为此,在Django-2.2中可以从class Index(fields=(), name=None, db_tablespace=None, opclasses=())中使用属性opclasses


from django.contrib.postgres.indexes import GistIndex

class GistIndexTrgrmOps(GistIndex):
    def create_sql(self, model, schema_editor):
        # - this Statement is instantiated by the _create_index_sql()
        #   method of django.db.backends.base.schema.BaseDatabaseSchemaEditor.
        #   using sql_create_index template from
        #   django.db.backends.postgresql.schema.DatabaseSchemaEditor
        # - the template has original value:
        #   "CREATE INDEX %(name)s ON %(table)s%(using)s (%(columns)s)%(extra)s"
        statement = super().create_sql(model, schema_editor)
        # - however, we want to use a GIST index to accelerate trigram
        #   matching, so we want to add the gist_trgm_ops index operator
        #   class
        # - so we replace the template with:
        #   "CREATE INDEX %(name)s ON %(table)s%(using)s (%(columns)s gist_trgrm_ops)%(extra)s"
        statement.template =\
            "CREATE INDEX %(name)s ON %(table)s%(using)s (%(columns)s gist_trgm_ops)%(extra)s"

        return statement

然后您可以在模型类中使用以下代码:

class YourModel(models.Model):
    some_field = models.TextField(...)

    class Meta:
        indexes = [
            GistIndexTrgrmOps(fields=['some_field'])
        ]

答案 2 :(得分:2)

In case someone want to have index on multiple columns joined (concatenated) with space you can use my modicitaion of built-in index.

Creates index like 01-04 05:14:19.839 7537 7537 F zygote64: java_vm_ext.cc:504] at com.qualcomm.qti.Performance.native_perf_hint(Native method) 01-04 05:14:19.839 7537 7537 F zygote64: java_vm_ext.cc:504] at com.qualcomm.qti.Performance.perfHint(Performance.java:65) 01-04 05:14:19.839 7537 7537 F zygote64: java_vm_ext.cc:504] at java.lang.reflect.Method.invoke(Native method) 01-04 05:14:19.839 7537 7537 F zygote64: java_vm_ext.cc:504] at android.util.BoostFramework.perfHint(BoostFramework.java:176) 01-04 05:14:19.839 7537 7537 F zygote64: java_vm_ext.cc:504] at com.android.providers.media.MtpService.addStorageLocked(MtpService.java:295) 01-04 05:14:19.839 7537 7537 F zygote64: java_vm_ext.cc:504] at com.android.providers.media.MtpService.addStorageDevicesLocked(MtpService.java:67) 01-04 05:14:19.839 7537 7537 F zygote64: java_vm_ext.cc:504] at com.android.providers.media.MtpService.manageServiceLocked(MtpService.java:214) 01-04 05:14:19.839 7537 7537 F zygote64: java_vm_ext.cc:504] - locked <0x0a63faea> (a java.lang.Class<com.android.providers.media.MtpService>) 01-04 05:14:19.839 7537 7537 F zygote64: java_vm_ext.cc:504] at com.android.providers.media.MtpService.onStartCommand(MtpService.java:156) 01-04 05:14:19.839 7537 7537 F zygote64: java_vm_ext.cc:504] at android.app.ActivityThread.-wrap20(ActivityThread.java:-1) 01-04 05:14:19.839 7537 7537 F zygote64: java_vm_ext.cc:504] at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1698) 01-04 05:14:19.839 7537 7537 F zygote64: java_vm_ext.cc:504] at android.os.Handler.dispatchMessage(Handler.java:105) 01-04 05:14:19.839 7537 7537 F zygote64: java_vm_ext.cc:504] at android.os.Looper.loop(Looper.java:164) 01-04 05:14:19.839 7537 7537 F zygote64: java_vm_ext.cc:504] at android.app.ActivityThread.main(ActivityThread.java:6548) 01-04 05:14:19.839 7537 7537 F zygote64: java_vm_ext.cc:504] at java.lang.reflect.Method.invoke(Native method) 01-04 05:14:19.839 7537 7537 F zygote64: java_vm_ext.cc:504] at com.android.internal.os.Zygote$MethodAndArgsCaller.run(Zygote.java:240) 01-04 05:14:19.839 7537 7537 F zygote64: java_vm_ext.cc:504] at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:767) 01-04 05:14:19.930 7537 7537 F zygote64: runtime.cc:492] Runtime aborting...

gin (("column1" || ' ' || "column2" || ' ' || ...) gin_trgm_ops)

答案 3 :(得分:1)

这已经有了答案,但是在Django 2.2中,您可以更轻松地做到这一点:

class MyModel(models.Model):
   name = models.TextField()
   class Meta:
       indexes = [GistIndex(name="gist_trgm_idx", fields=("name",), opclasses=("gist_trgm_ops",))]

或者,您可以使用GinIndex

答案 4 :(得分:0)

要使Django 2.2使用icontains和类似搜索的索引:

子类GinIndex:

from django.contrib.postgres.indexes import GinIndex

class UpperGinIndex(GinIndex):

    def create_sql(self, model, schema_editor, using=''):
        statement = super().create_sql(model, schema_editor, using=using)
        quote_name = statement.parts['columns'].quote_name

        def upper_quoted(column):
            return f'UPPER({quote_name(column)})'
        statement.parts['columns'].quote_name = upper_quoted
        return statement

像这样将索引添加到模型中,包括使用name时需要的kwarg opclasses

class MyModel(Model):
    name = TextField(...)

    class Meta:
        indexes = [
            UpperGinIndex(fields=['name'], name='mymodel_name_gintrgm', opclasses=['gin_trgm_ops'])
        ]

生成迁移并编辑生成的文件:

# Generated by Django 2.2.3 on 2019-07-15 10:46
from django.contrib.postgres.operations import TrigramExtension  # <<< add this
from django.db import migrations
import myapp.models


class Migration(migrations.Migration):

    operations = [
        TrigramExtension(),   # <<< add this
        migrations.AddIndex(
            model_name='mymodel',
            index=myapp.models.UpperGinIndex(fields=['name'], name='mymodel_name_gintrgm', opclasses=['gin_trgm_ops']),
        ),
    ]