Docker进程暂停并被杀死

时间:2016-11-02 23:10:06

标签: python r bash docker dockerfile

在主题中,docker进程暂停并被杀死。 我的python项目运行bash脚本,其中一个部分是运行R脚本,它从Influxdb中提取数据然后处理它。当项目在短时间内获得数据时,例如1-5天,这不是问题。整个事情开始于更长的时间框架,如几周。它只是放慢速度以至于生成任何东西需要很长时间(我检查了日志)并最终被杀死。当R脚本撤回大约25mb的数据时,没关系,但70mb的数据并不那么容易。 Flask + bash + R可以同时使用太多内存吗?在docker

之外调用时不会出现此类问题

Dockerfile:

FROM ubuntu
# Install requirements fot the flask app
RUN apt-get clean && apt-get update && DEBIAN_FRONTEND=noninteractive apt-get upgrade -y && DEBIAN_FRONTEND=noninteractive apt-get install -y \
    python3 \
    python3-pip \
    r-base \
    r-base-dev \
    r-cran-rgl \
    mutt \
    git \
    texlive-fonts-recommended

# Install requirements fot the flask app
RUN pip3 install -r ./requirements.txt

flask app snippet:

@app.route('/send', methods=['POST'])
def send():
    path = os.path.dirname(os.path.realpath(__file__))
    script = path + '/generate_pdf.sh'
    address = str(request.form['email'])
    start_date = convert_date(str(request.form['start_date']))
    end_date = convert_date(str(request.form['end_date']))
    command = [script, start_date, end_date, address]
    subprocess.run(command)
    return json.dumps({
        'status': 'OK',
        'message': 'The action is completed'
    })

generate_pdf.sh:

#!/bin/bash
start_date="$1"
end_date="$2"
address="$3"
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
report_name="\"$DIR/my_document.pdf\""
R -e "rmarkdown::render('$DIR/generate_document.Rmd', output_file = $report_name)" --args "$start_date" "$end_date"
report_name="$DIR/my_document.pdf"
echo | mutt -s "Generated document" -a $report_name -- "$address"
out=$(rm $report_name)

R脚本代码段:

where.clause <- paste0("time >= '", 
                       start.date,
                       "' AND time <= '", 
                       as.character(as.Date(end.date) + days(1)), 
                       "'")
con <- influxdbr::influx_connection(host = "localhost",
                                    port = 8086,
                                    user = "root",
                                    pass = "root")
select.query <- paste0(
  'id, name, surname, car, employment_status'
)
rows <- influx_select(con, db = 'my_db', select.query, from = 'workers',
                         where = where.clause)
rows <- as.data.frame(rows, stringsAsFactors = FALSE)
if(is.data.frame(rows) && nrow(rows) == 0) {
  cat('No data could be obtained from the database.', sep = '\n')
  knitr::knit_exit()
}

以下是我在执行应用程序时获得的日志,假设要撤消大约74mb的数据。

....
label: unnamed-chunk-4 (with options) 
List of 3
 $ echo   : logi FALSE
 $ message: logi FALSE
 $ warning: logi FALSE

Success: (204) No Content
/app/generate_pdf.sh: line 8:    58 Killed
....

该应用程序在docker之外完美运行。

调用此命令rows <- influx_select时,将以原始版本获取数据。在它被转换为数据帧之前,它的权重很大--24mb,70甚至更多。

我在docker中手动运行脚本,R脚本更进一步:

....
label: unnamed-chunk-8 (with options) 
List of 4
 $ echo      : logi FALSE
 $ message   : logi FALSE
 $ fig.align : chr "left"
 $ fig.height: num 7

Quitting from lines 72-76 (generate_document.Rmd) 
Error in system(paste(which, shQuote(names[i])), intern = TRUE, ignore.stderr = TRUE) : 
cannot popen '/usr/bin/which 'pdfcrop' 2>/dev/null', probable reason 'Cannot allocate memory'
...

0 个答案:

没有答案