我在Heroku上托管的应用程序中遇到问题(2xweb + 1x worder dynos)。该过程以创建并通过SendGrid发送的大量电子邮件结束。这需要一些时间,导致网络工作者超时和糟糕的可用性,所以它被重构为一个工人,我认为这将解决问题,但我得到这样的情况:
Apr 10 17:12:48 wc heroku/web.2: Processing by DealsController#show as */*
[request is processed]
Apr 10 17:12:50 wc app/worker.1: [worker sending emails]
[a lot of lines with debug data cut]
Apr 10 17:12:53 wc heroku/worker.1: source=worker.1 dyno=heroku.16079351.858a3455-0b9f-4f75-9052-b419d4653703 sample#load_avg_1m=0.29 sample#load_avg_5m=0.07 sample#load_avg_15m=0.02
Apr 10 17:12:53 wc heroku/worker.1: source=worker.1 dyno=heroku.16079351.858a3455-0b9f-4f75-9052-b419d4653703 sample#memory_total=240.45MB sample#memory_rss=240.34MB sample#memory_cache=0.11MB sample#memory_swap=0.00MB sample#memory_pgpgin=85990pages sample#memory_pgpgout=24436pages
Apr 10 17:12:53 wc heroku/web.2: source=web.2 dyno=heroku.16079351.879182ef-13f0-4908-bf35-c487ccab6153 sample#load_avg_1m=0.00 sample#load_avg_5m=0.01 sample#load_avg_15m=0.01
Apr 10 17:12:54 wc heroku/web.2: source=web.2 dyno=heroku.16079351.879182ef-13f0-4908-bf35-c487ccab6153 sample#memory_total=844.16MB sample#memory_rss=511.82MB sample#memory_cache=0.00MB sample#memory_swap=332.34MB sample#memory_pgpgin=223581pages sample#memory_pgpgout=92554pages
Apr 10 17:12:54 wc heroku/web.2: Process running mem=844M(164.9%)
Apr 10 17:12:54 wc heroku/web.2: Error R14 (Memory quota exceeded)
Apr 10 17:12:54 wc app/web.2: ** [NewRelic][04/10/14 15:12:54 +0000 879182ef-13f0-4908-bf35-c487ccab6153 (468)] INFO : Starting Agent shutdown
Apr 10 17:12:55 wc app/worker.1: ** [Bugsnag] Bugsnag exception handler 1.6.2 ready, api_key=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
[8 new relic lines cut]
Apr 10 17:12:56 wc app/heroku-postgres: source=HEROKU_POSTGRESQL_WHITE sample#current_transaction=144542 sample#db_size=189720760bytes sample#tables=92 sample#active-connections=12 sample#waiting-connections=0 sample#index-cache-hit-rate=0.99981 sample#table-cache-hit-rate=0.99868 sample#load-avg-1m=0.36 sample#load-avg-5m=0.3 sample#load-avg-15m=0.285 sample#read-iops=38.367 sample#write-iops=13.221 sample#memory-total=7629464kB sample#memory-free=187884kB sample#memory-cached=6599816kB sample#memory-postgres=689216kB
Apr 10 17:12:56 wc app/worker.1: ** [NewRelic][04/10/14 15:12:55 +0000 858a3455-0b9f-4f75-9052-b419d4653703 (99)] INFO : Installing DelayedJob instrumentation hooks
[11 new relic lines cut]
Apr 10 17:12:59 wc app/worker.1: Delayed::Backend::ActiveRecord::Job Load (1.6ms) UPDATE "delayed_jobs" SET locked_at = '2014-04-10 15:12:59.482734', locked_by = 'host:858a3455-0b9f-4f75-9052-b419d4653703 pid:2' WHERE id IN (SELECT id FROM "delayed_jobs" WHERE ((run_at <= '2014-04-10 15:12:59.481969' AND (locked_at IS NULL OR locked_at < '2014-04-10 11:12:59.482002') OR locked_by = 'host:858a3455-0b9f-4f75-9052-b419d4653703 pid:2') AND failed_at IS NULL) ORDER BY priority ASC, run_at ASC LIMIT 1 FOR UPDATE) RETURNING *
Apr 10 17:12:59 wc app/worker.1: ** [NewRelic][04/10/14 15:12:59 +0000 858a3455-0b9f-4f75-9052-b419d4653703 (99)] INFO : Starting Agent shutdown
Apr 10 17:13:09 wc app/worker.1: Delayed::Backend::ActiveRecord::Job Load (1.6ms) UPDATE "delayed_jobs" SET locked_at = '2014-04-10 15:13:09.486147', locked_by = 'host:858a3455-0b9f-4f75-9052-b419d4653703 pid:2' WHERE id IN (SELECT id FROM "delayed_jobs" WHERE ((run_at <= '2014-04-10 15:13:09.485453' AND (locked_at IS NULL OR locked_at < '2014-04-10 11:13:09.485486') OR locked_by = 'host:858a3455-0b9f-4f75-9052-b419d4653703 pid:2') AND failed_at IS NULL) ORDER BY priority ASC, run_at ASC LIMIT 1 FOR UPDATE) RETURNING *
Apr 10 17:13:13 wc heroku/web.2: source=web.2 dyno=heroku.16079351.879182ef-13f0-4908-bf35-c487ccab6153 sample#load_avg_1m=0.00 sample#load_avg_5m=0.01 sample#load_avg_15m=0.01
Apr 10 17:13:14 wc heroku/web.2: source=web.2 dyno=heroku.16079351.879182ef-13f0-4908-bf35-c487ccab6153 sample#memory_total=463.75MB sample#memory_rss=153.29MB sample#memory_cache=0.00MB sample#memory_swap=310.46MB sample#memory_pgpgin=245188pages sample#memory_pgpgout=205946pages
Apr 10 17:13:14 wc app/web.2: Started GET "/user" for [IP.IP.IP.IP] at 2014-04-10 15:13:13 +0000
Apr 10 17:13:14 wc heroku/worker.1: source=worker.1 dyno=heroku.16079351.858a3455-0b9f-4f75-9052-b419d4653703 sample#load_avg_1m=0.21 sample#load_avg_5m=0.06 sample#load_avg_15m=0.02
Apr 10 17:13:14 wc heroku/worker.1: source=worker.1 dyno=heroku.16079351.858a3455-0b9f-4f75-9052-b419d4653703 sample#memory_total=157.25MB sample#memory_rss=157.14MB sample#memory_cache=0.11MB sample#memory_swap=0.00MB sample#memory_pgpgin=103179pages sample#memory_pgpgout=62923pages
Apr 10 17:13:16 wc app/web.2: Started GET "/user" for [IP.IP.IP.IP] at 2014-04-10 15:13:16 +0000
Apr 10 17:13:17 wc heroku/router: at=error code=H12 desc="Request timeout" method=GET path=/pages/planning host=www.cool-app.com request_id=c62a7ee5-11d8-4286-846a-a55861cc6a0e fwd="[IP.IP.IP.IP]" dyno=web.2 connect=2ms service=30000ms status=503 bytes=0
Apr 10 17:13:19 wc app/web.2: E, [2014-04-10T15:13:18.948990 #2] ERROR -- : worker=1 PID:12 timeout (31s > 30s), killing
Apr 10 17:13:19 wc app/worker.1: Delayed::Backend::ActiveRecord::Job Load (1.5ms) UPDATE "delayed_jobs" SET locked_at = '2014-04-10 15:13:19.489494', locked_by = 'host:858a3455-0b9f-4f75-9052-b419d4653703 pid:2' WHERE id IN (SELECT id FROM "delayed_jobs" WHERE ((run_at <= '2014-04-10 15:13:19.488845' AND (locked_at IS NULL OR locked_at < '2014-04-10 11:13:19.488874') OR locked_by = 'host:858a3455-0b9f-4f75-9052-b419d4653703 pid:2') AND failed_at IS NULL) ORDER BY priority ASC, run_at ASC LIMIT 1 FOR UPDATE) RETURNING *
Apr 10 17:13:20 wc app/web.2: E, [2014-04-10T15:13:19.689336 #2] ERROR -- : reaped #<Process::Status: pid 12 SIGKILL (signal 9)> worker=1
Apr 10 17:13:21 wc app/web.2: Disconnected from ActiveRecord
Apr 10 17:13:27 wc app/web.2: Processing by UsersController#show as HTML
[web worker 2 works properly from now on]
web.2工作者仍然崩溃,导致用户大约20秒的挂起,并且&#34;应用程序错误&#34;屏幕显示。奇怪的是,这发生在不同的页面上,似乎与后台的工作人员有关。
特别让我感到困惑的一句话(可能是崩溃的症状)是:
Apr 10 17:13:19 wc app/web.2: E, [2014-04-10T15:13:18.948990 #2] ERROR -- : worker=1 PID:12 timeout (31s > 30s), killing
这是什么意思?在我看来,web.2 dyno被杀了,因为worker = 1有一个timout,这看起来有点疯狂。
dynos的配置是:
Dynos:
1x - 2 - web bundle exec unicorn -p $PORT -c ./config/unicorn.rb
1x - 1 - worker bundle exec rake jobs:work
有什么想法吗?
答案 0 :(得分:1)
解决方案:从delayed_job
切换到sidekiq
sidekiq
。这解决了这个问题。可能只是特定delayed_job的错误 - &gt; heroku互动。也许这会在今天起作用......