Question

我有一个凤凰应用程序正在生产中，但我注意到，只要发生错误，它只会直接崩溃整个应用程序，而在开发中它不会。首先我尝试填写所有错误，但后来我认为这不是Elixir / Erlang的工作方式，在开发过程中尝试使用相同的操作，错误只记录一次，崩溃的任务重新启动同时在prod中记录相同的错误几个时间和崩溃整个应用程序。我不确定它是否与我的配置有关，这是我现在生产的Endpoint配置。

config :appname, AppName.Web.Endpoint,
  on_init: {AppName.Web.Endpoint, :load_from_system_env, []},
  cache_static_manifest: "priv/static/cache_manifest.json",
  http: [port: {:system, "PORT"}],
  url: [host: "localhost", port: {:system, "PORT"}],
  root: ".",
  debug_errors: false,
  server: true,
  code_reloader: false,
  check_origin: false,
  version: Mix.Project.config[:version],
  secret_key_base: System.get_env("SECRET_KEY_BASE"),
  watchers: []

这是dev.exs文件

config :appname, AppName.Web.Endpoint,
  http: [port: 4000],
  debug_errors: true,
  code_reloader: true,
  check_origin: false,
  secret_key_base: "rFiGCabqtoBaPZUZLoGaRuhgbBkynQazMnI2dpxN4aQEJzyQx0J7beyU2AZ0yMYO",
  watchers: [node: ["node_modules/brunch/bin/brunch", "watch", "--stdin",
                    cd: Path.expand("../assets", __DIR__)]]

我希望它与配置有关，有点烦人的是必须重新启动应用程序以获得应该避免的最小错误。

编辑：我刚注意到应用程序有时会崩溃，有时候却没有，我正在记录终端的事件并看到了这个：

(Postgrex.Error) ERROR 22001 (string_data_right_truncation): value too long for type character varying(255)
    (ecto) lib/ecto/adapters/sql.ex:571: Ecto.Adapters.SQL.struct/7
    (ecto) lib/ecto/repo/schema.ex:467: Ecto.Repo.Schema.apply/4
    (ecto) lib/ecto/repo/schema.ex:276: anonymous fn/13 in Ecto.Repo.Schema.do_update/4
    (euridime) lib/euridime/telegram/handlers/keyboard/keyboard.ex:331: Euridime.Keyboard.set_user_wallet/2
    (euridime) lib/euridime/telegram/handlers/keyboard/keyboard.ex:136: Euridime.Keyboard.check_command/1
    (elixir) lib/enum.ex:645: Enum."-each/2-lists^foreach/1-0-"/2
    (elixir) lib/enum.ex:645: Enum.each/2
    (euridime) lib/euridime/telegram/task.ex:9: Euridime.Task.pull_updates/1
Function: &Euridime.Task.pull_updates/0
    Args: []

P.S。这只是上次崩溃的错误，与Postgres有关，它也因其他原因而崩溃。

这是多次记录的错误，有时只记录一次并避免，那么为什么它会在这里崩溃而不是重新启动？我认为它只是重新启动太多次然后继续崩溃？我怎么能避免这个？

编辑2：开始回调：

def start(_type, _args) do
    import Supervisor.Spec

    # Define workers and child supervisors to be supervised
    children = [
      # Start the Ecto repository
      supervisor(Euridime.Repo, []),
      # Start the endpoint when the application starts
      supervisor(Euridime.Web.Endpoint, []),

      worker(Task, [Euridime.Task, :pull_updates, []], id: :pull_updates),
      worker(Euridime.DETS, []),
      worker(Euridime.Emailer, []),
    ]

    # Registry
    gen = [
      worker(Euridime.Server, []),
      worker(Euridime.Notify, [], restart: :transient),
      worker(Euridime.PayService, [], restart: :transient)
    ]
    supervise(gen, strategy: :simple_one_for_one)

    opts = [strategy: :one_for_one, name: Euridime.Supervisor]
    Supervisor.start_link(children, opts)
  end

Answer 1

看起来我在监督树上有一些不好的策略，我有:one_for_one而不是:one_for_all导致崩溃，我不确定有关哪些流程所需的技术解释重新启动，以便应用程序不会崩溃。它现在已经解决了，我需要继续在生产中测试这种行为以进行检查。

Answer 2

您将要增加max_restarts和max_seconds。默认情况下，它们分别是3和5。这意味着主管将在5秒内最多重新启动子进程3次-如果您的孩子在5秒内第4次崩溃，则主管将不会重新启动子进程。

有关详细信息，请参见https://hexdocs.pm/elixir/Supervisor.html。

生产中的凤凰应用程序崩溃而不是开发

2 个答案: