查询需要30-40秒(通过ORM和原始SQL)。 200kk行

时间:2016-02-14 20:17:01

标签: python django postgresql django-1.9

我的查询有问题需要30-40秒。

我接下来要做2个查询:

country = CountriesTranslations.objects.filter(name='United Kingdom').first().country # Takes 1-8 secs
city = Cities.objects.filter(citiestranslations__name='London', country=country).first() # Takes 30-40 secs

通过connection.queries检查第二个查询的原始版本,以查看Django尝试执行的内容 然后通过pgAdmin启动相同的代码以避免Python \ ORM的影响 - 执行时间非常相似

原始查询:

SELECT * FROM cities
INNER JOIN cities_translations ON (cities.id = cities_translations.city_id)
WHERE (cities_translations.name = 'London' AND cities.country_id = 32) ORDER BY cities.id ASC LIMIT 1

这是'EXPLAIN ANALYZE':

"Limit  (cost=0.43..217162.78 rows=1 width=182) (actual time=35674.343..35674.344 rows=1 loops=1)"
"  ->  Nested Loop  (cost=0.43..10206631.04 rows=47 width=182) (actual time=35674.341..35674.341 rows=1 loops=1)"
"        Join Filter: (cities.id = cities_translations.city_id)"
"        Rows Removed by Join Filter: 33684"
"        ->  Index Scan using city_pkey on cities  (cost=0.43..5202957.08 rows=20514 width=168) (actual time=83.479..92.091 rows=16843 loops=1)"
"              Filter: (country_id = 32)"
"              Rows Removed by Filter: 164074"
"        ->  Materialize  (cost=0.00..3210046.95 rows=5829 width=14) (actual time=0.193..2.112 rows=2 loops=16843)"
"              ->  Seq Scan on cities_translations  (cost=0.00..3210017.80 rows=5829 width=14) (actual time=3247.887..35572.948 rows=2 loops=1)"
"                    Filter: ((name)::text = 'London'::text)"
"                    Rows Removed by Filter: 178754098"
"Planning time: 2.122 ms"
"Execution time: 35674.519 ms"

型号:

class Countries(models.Model):
    iso2 = models.CharField('ISO 2', max_length=2, blank=True)
    iso3 = models.CharField('ISO 3', max_length=3, blank=True)
    # ... lots of other stuff

    class Meta:
        db_table = 'countries'
        verbose_name = 'Country'
        verbose_name_plural = 'Countries'
        ordering = ['iso2']

class CountriesTranslations(models.Model):
    name = models.CharField("Translation", max_length=60, db_index=True, blank=True)
    is_short_name = models.IntegerField()
    description = models.TextField(blank=True)

    country = models.ForeignKey(Countries, on_delete=models.CASCADE, verbose_name='Country')
    lang = models.ForeignKey(Languages, on_delete=models.CASCADE, null=True, verbose_name='Language')

    class Meta:
        db_table = 'countries_translations'
        verbose_name = 'Country Translation'
        verbose_name_plural = 'Countries Translations'
        ordering = ['name']


class Cities(models.Model):
    city_type = models.ForeignKey('CitiesType', on_delete=models.CASCADE, verbose_name='City type', null=True)
    country = models.ForeignKey('Countries', on_delete=models.CASCADE, verbose_name='Country')
    # ... lots of other stuff

    class Meta:
        db_table = 'cities'
        verbose_name = 'City'
        verbose_name_plural = 'Cities'

class CitiesTranslations(models.Model):
    name = models.CharField(max_length=150, db_index=True, blank=True)
    description = models.TextField(blank=True)

    city = models.ForeignKey(Cities, on_delete=models.CASCADE, verbose_name='Details of City')
    lang = models.ForeignKey('Languages', on_delete=models.CASCADE, verbose_name='Language of city', null=True)

    class Meta:
        db_table = 'cities_translations'
        verbose_name = 'City Translation'
        verbose_name_plural = 'Cities Translations'
        ordering = ['name']

系统:

  • Django 1.9
  • PostgreSQL 9.4.6
  • Ubuntu 15.10
  • SSD(不是HDD)

行:

  • CountriesTranslations~11k
  • 城市~2.5kk
  • CitiesTranslations~180kk

还尝试根据本文更改Postgresql的默认conf:https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

有人可以建议我去哪儿挖吗?

1 个答案:

答案 0 :(得分:1)

对于此查询:

SELECT *
FROM cities c INNER JOIN
     cities_translations ct
     ON c.id = ct.city_id
WHERE ct.name = 'London' AND c.country_id = 32
ORDER BY c.id ASC
LIMIT 1;

您想尝试一些索引。我建议的第一个是cities(country_id, id)cities_translations(city_id, name)