计划R脚本有气流时出现分段错误

时间:2019-09-19 07:58:12

标签: r airflow

我使用Airflow(docker容器)运行R脚本。我收到以下错误。

[2019-09-19 07:03:26,500] {{bash_operator.py:127}} INFO -  *** caught segfault ***
[2019-09-19 07:03:26,500] {{bash_operator.py:127}} INFO - address 0x55cf00000000, cause 'memory not mapped'
[2019-09-19 07:03:26,501] {{bash_operator.py:127}} INFO - 
[2019-09-19 07:03:26,501] {{bash_operator.py:127}} INFO - Traceback:
[2019-09-19 07:03:26,501] {{bash_operator.py:127}} INFO -  1: is.data.frame(x)
[2019-09-19 07:03:26,502] {{bash_operator.py:127}} INFO -  2: FUN(X[[i]], ...)
[2019-09-19 07:03:26,502] {{bash_operator.py:127}} INFO -  3: lapply(.x, .f, ...)
[2019-09-19 07:03:26,502] {{bash_operator.py:127}} INFO -  4: map(result, subset_rows, i)
[2019-09-19 07:03:26,502] {{bash_operator.py:127}} INFO -  5: `[.tbl_df`(x, ind, , drop = FALSE)
[2019-09-19 07:03:26,503] {{bash_operator.py:127}} INFO -  6: x[ind, , drop = FALSE]
[2019-09-19 07:03:26,503] {{bash_operator.py:127}} INFO -  7: FUN(X[[i]], ...)
[2019-09-19 07:03:26,504] {{bash_operator.py:127}} INFO -  8: lapply(split(x = seq_len(nrow(x)), f = f, drop = drop, ...),     function(ind) x[ind, , drop = FALSE])
[2019-09-19 07:03:26,505] {{bash_operator.py:127}} INFO -  9: split.data.frame(es6, (0:(nrow(es6) - 1)%/%50))
[2019-09-19 07:03:26,505] {{bash_operator.py:127}} INFO - 10: split(es6, (0:(nrow(es6) - 1)%/%50))
[2019-09-19 07:03:26,506] {{bash_operator.py:127}} INFO - An irrecoverable exception occurred. R is aborting now ...
[2019-09-19 07:03:27,087] {{bash_operator.py:127}} INFO - /tmp/airflowtmpuj8lcw3e/web_etl_bf_10_days7wpo7bvb: line 1:  1140 Segmentation fault      (core dumped) Rscript /usr/local/airflow/dags/scripts/r/etl_web_api_by_create_time.R -d "2019-09-05 00:00:00+00:00"
[2019-09-19 07:03:27,088] {{bash_operator.py:131}} INFO - Command exited with return code 139

错误代码为split(es6, (0:(nrow(es6) - 1)%/%50))。数据帧es6大约有1096行和20列。

我有时无法通过Airflow重现成功而失败的错误。 (而且,当我通过Rstudio Sever运行代码时,代码就可以工作。)

我怀疑服务器内存不足可能是原因。我的linux服务器总共有8GB内存。当我在运行任务时检查内存时,它有大约1700MB的可用空间(使用free -m命令)。

我在互联网上搜索,有人认为这种错误可能是由于该函数的错误引起的,即split

编辑:

更改为split(as.data.frame(es6), (0:(nrow(es6)-1) %/% 50))之后。新的日志:

[2019-09-19 09:17:22,652] {{bash_operator.py:127}} INFO -  *** caught segfault ***
[2019-09-19 09:17:22,652] {{bash_operator.py:127}} INFO - address 0x55f600000000, cause 'memory not mapped'
[2019-09-19 09:17:22,652] {{bash_operator.py:127}} INFO - 
[2019-09-19 09:17:22,652] {{bash_operator.py:127}} INFO - Traceback:
[2019-09-19 09:17:22,652] {{bash_operator.py:127}} INFO -  1: dim(xj)
[2019-09-19 09:17:22,653] {{bash_operator.py:127}} INFO -  2: `[.data.frame`(x, ind, , drop = FALSE)
[2019-09-19 09:17:22,653] {{bash_operator.py:127}} INFO -  3: x[ind, , drop = FALSE]
[2019-09-19 09:17:22,653] {{bash_operator.py:127}} INFO -  4: FUN(X[[i]], ...)
[2019-09-19 09:17:22,653] {{bash_operator.py:127}} INFO -  5: lapply(split(x = seq_len(nrow(x)), f = f, drop = drop, ...),     function(ind) x[ind, , drop = FALSE])
[2019-09-19 09:17:22,653] {{bash_operator.py:127}} INFO -  6: split.data.frame(as.data.frame(es6), (0:(nrow(es6) - 1)%/%50))
[2019-09-19 09:17:22,653] {{bash_operator.py:127}} INFO -  7: split(as.data.frame(es6), (0:(nrow(es6) - 1)%/%50))
[2019-09-19 09:17:22,653] {{bash_operator.py:127}} INFO - An irrecoverable exception occurred. R is aborting now ...
[2019-09-19 09:17:23,179] {{bash_operator.py:127}} INFO - /tmp/airflowtmp9jy2rurg/web_etl_bf_7_days4x9d7xqz: line 1:  1220 Segmentation fault      (core dumped) Rscript /usr/local/airflow/dags/scripts/r/etl_web_api_by_create_time.R -d "2019-09-10 00:00:00+00:00"
[2019-09-19 09:17:23,179] {{bash_operator.py:131}} INFO - Command exited with return code 139

0 个答案:

没有答案