Question

更新：我不再相信数据库是我的问题。我还看到此应用程序中静态资产的第二次加载时间很多。

我有一个以Azure Cloud Service运行的应用程序。它是一个运行Express框架的Node应用程序，用于前端的后端和Backbone。我有一个运行Postgres的Ubuntu虚拟机（也在Azure上）。

在一天中的某些时候，任何需要数据库查询的视图（大多数都是这样）的加载速度都很慢。这是一个例子来说明。我们有一个分页视图，它通过一些连接执行相当复杂的查询，并限制为30个结果。当我在早上8点之前首次加载此视图时，页面加载时间约为200毫秒。现在，页面通常需要5到20秒才能加载。

鉴于性能问题似乎只发生在执行查询时，我相对确定这与我的数据库有关，但我不知道从哪里开始。是我的VM吗？是数据库设计吗？在服务器上运行类似pgBouncer的东西会解决问题吗？

我在我的示例中提到的查询上尝试了EXPLAIN ANALYZE。结果是查询应该花费75毫秒，这不能反映我在应用程序中看到的真实性能。我甚至创建了一些新的索引来尝试使连接运行得更顺畅，但我完全没有区别。这是EXPLAIN ANALYZE：

的输出

Limit  (cost=128.40..154.59 rows=30 width=191) (actual time=73.254..73.375 rows=30 loops=1)
  ->  WindowAgg  (cost=128.40..634.73 rows=580 width=191) (actual time=73.238..73.287 rows=30 loops=1)
        ->  Nested Loop Left Join  (cost=128.40..627.48 rows=580 width=191) (actual time=21.556..67.711 rows=580 loops=1)
              ->  Merge Right Join  (cost=128.12..277.33 rows=580 width=187) (actual time=21.530..49.437 rows=580 loops=1)
                    Merge Cond: (sites.id = siteinfo.siteid)
                    ->  Merge Left Join  (cost=127.85..215.80 rows=616 width=114) (actual time=21.507..38.946 rows=616 loops=1)
                          Merge Cond: (sites.id = userssites.siteid)
                          ->  Index Scan using sites_pkey on sites  (cost=0.28..54.34 rows=616 width=82) (actual time=0.010..1.125 rows=616 loops=1)
                          ->  Materialize  (cost=127.57..152.29 rows=611 width=69) (actual time=21.485..31.958 rows=611 loops=1)
                                ->  GroupAggregate  (cost=127.57..144.65 rows=611 width=61) (actual time=21.480..30.198 rows=611 loops=1)
                                      ->  Sort  (cost=127.57..130.72 rows=1259 width=61) (actual time=21.465..23.196 rows=1259 loops=1)
                                            Sort Key: userssites.siteid
                                            Sort Method: quicksort  Memory: 226kB
                                            ->  Hash Join  (cost=20.84..62.75 rows=1259 width=61) (actual time=1.461..6.713 rows=1259 loops=1)
                                                  Hash Cond: (userssites.userid = users.id)
                                                  ->  Seq Scan on userssites  (cost=0.00..24.59 rows=1259 width=41) (actual time=0.008..1.771 rows=1259 loops=1)
                                                  ->  Hash  (cost=14.82..14.82 rows=482 width=28) (actual time=1.439..1.439 rows=482 loops=1)
                                                        Buckets: 1024  Batches: 1  Memory Usage: 30kB
                                                        ->  Seq Scan on users  (cost=0.00..14.82 rows=482 width=28) (actual time=0.008..0.713 rows=482 loops=1)
                    ->  Index Scan using siteinfo_pkey on siteinfo  (cost=0.28..52.73 rows=580 width=110) (actual time=0.011..4.958 rows=580 loops=1)
                          Filter: ((street1 IS NOT NULL) OR (street2 IS NOT NULL) OR (zip IS NOT NULL) OR (city IS NOT NULL) OR (state IS NOT NULL) OR (country IS NOT NULL))
              ->  Index Scan using agsfiles_pkey on agsfiles  (cost=0.28..0.59 rows=1 width=78) (actual time=0.011..0.012 rows=1 loops=580)
                    Index Cond: (id = sites.agsfileid)
Total runtime: 73.594 ms

奇怪的是，我的暂存实例（利用与生产实例在同一数据库服务器上运行的相同设计的独特数据库）似乎在150-250ms内返回的查询中表现良好，同时生产查询需要2.5秒。

如何从这里开始寻找并解决此问题？

如何找到我的Node / Express，Backbone，Postgres应用程序的性能问题？

0 个答案: