Question

我正在尝试从数据库查询中删除重复项。这是我的模特：

class Restaurant(models.Model):
    name = models.CharField(db_index=True)

class InformationSheet(models.Model):
    owner = models.ForeignKey(Restaurant, related_name='sheet')
    latitude = models.DecimalField(max_digits=10, decimal_places=6)
    longitude = models.DecimalField(max_digits=10, decimal_places=6)

    class Meta:
        indexes = [
            models.Index(fields=['latitude', 'longitude', 'owner']),
        ]

class Availability(models.Model):
   restaurant = models.ForeignKey(Restaurant, on_delete=models.CASCADE, db_index=True)
   supplier = models.ForeignKey(Supplier, on_delete=models.CASCADE)

   class Meta:
       indexes = [
        models.Index(fields=['restaurant', 'supplier']),
        ]

当为给定的供应商定义了餐厅供应量时，我需要在gps坐标之间选择InformationSheet。

    suppliers = [1, 2, 3]
    sheets = InformationSheet.objects.filter(
        latitude__gte=lat_start,
        latitude__lte=lat_end,
        longitude__gte=long_min',
        longitude__lte=long_max,
        owner__availability__supplier_id__in=suppliers
    ).distinct()

该表有数十万个条目。最初生成的SQL查询速度很快，但添加“ distinct”子句以消除重复项使查询对于我的需求而言太慢了。因为独特性阻止了索引的使用

我该如何进行？

Answer 1

Limit  (cost=1.12..325.09 rows=30 width=215)
  ->  Unique  (cost=1.12..76866.65 rows=7118 width=215)
        ->  Nested Loop  (cost=1.12..76848.85 rows=7118 width=215)
              ->  Nested Loop  (cost=0.70..73308.59 rows=7118 width=219)
                    ->  Index Scan using restaurant_sheet_pkey on accommodation_sheet  (cost=0.42..12290.52 rows=193562 width=215)
                          Filter: ((latitude >= '-180.0000000'::numeric) AND (latitude <= 180.0000000) AND (longitude >= '-200.0000000'::numeric) AND (longitude <= 200.0000000))
                    ->  Index Only Scan using restaur_resta_i_3e16b7_idx on restaurant_availability  (cost=0.28..0.31 rows=1 width=4)
                          Index Cond: (hotel_id = restaurant_sheet.owner_id)
                          Filter: (supplier_id = ANY ('{1,2,3}'::integer[]))
              ->  Index Only Scan using restaurant_restaurant_pkey on restaurant_restaurant  (cost=0.42..0.50 rows=1 width=4)
                    Index Cond: (id = restaurant_sheet.owner_id)

事实上，当查看查询计划时，使用了索引，但我不明白为什么要花500毫秒（仅占用我们数据库的一部分）。我有一个SSD以及我们的开发服务器。

我认为我对模型的结构设计不好，但是我真的不知道该怎么做。

Django：Queryset不同子句性能问题

1 个答案: