通过Flask Memory Leak的Tensorflow初始模型

时间:2017-06-08 20:20:07

标签: linux flask memory-leaks tensorflow

以下是我用于通过Flask提供Inception-Model的代码。但不幸的是,Linux在后台杀死程序以便使用内存。

从内核日志中,我发现server.py python程序被Linux OOM-Killer杀死,因为内核无法满足其他程序请求的内存,因为可用内存不足,因此通过选择杀死内存来释放内存python进程。

请查看进程消耗的内存跟踪(total_vm)。它接近1.5GB到1.7GB,这对我来说似乎非常高。

[ pid ]   uid  tgid               total_vm     rss                  nr_ptes swapents            oom_score_adj                name
[ 8640]     0  8640             1654607  1436423          3080      35564                  0                                           python
[32139]    0 32139           1712754  1495071          3195      34153                  0                                           python
[25121]    0 25121           1586597  1390072          2943     9795                    0                                           python

Jun  8 19:15:32 incfs1002 kernel: [16448663.210440] Out of memory: Kill process 8640 (python) score 565 or sacrifice child
Jun  8 19:15:32 incfs1002 kernel: [16448663.211941] Killed process 8640 (python) total-vm:6618428kB, anon-rss:5745664kB, file-rss:28kB

Jun  8 18:21:16 incfs1002 kernel: [16445405.714834] Out of memory: Kill process 32139 (python) score 587 or sacrifice child
Jun  8 18:21:16 incfs1002 kernel: [16445405.714878] Killed process 32139 (python) total-vm:6851016kB, anon-rss:5980284kB, file-rss:0kB

Jun  7 17:40:55 incfs1002 kernel: [16356536.627117] Out of memory: Kill process 25121 (python) score 537 or sacrifice child
Jun  7 17:40:55 incfs1002 kernel: [16356536.627157] Killed process 25121 (python) total-vm:6346388kB, anon-rss:5560164kB, file-rss:124kB

代码:

import os
from flask import Flask, request, jsonify
from flask_cors import CORS, cross_origin
import tensorflow as tf

ALLOWED_EXTENSIONS = set(['jpg', 'jpeg'])

app = Flask(__name__)
CORS(app)
app.config['UPLOAD_FOLDER'] = 'uploads'


def allowed_file(filename):
    return filename[-3:].lower() in ALLOWED_EXTENSIONS


@app.route('/classify', methods=['GET'])
@cross_origin()
def classify_image():
    result = {}
    filename = request.args.get('file')
# Check if filename matches

if filename:
    image_path = os.path.join(app.config['UPLOAD_FOLDER'], filename)
    image_data = tf.gfile.FastGFile(image_path, 'rb').read()

    label_lines = [line.strip() for line in tf.gfile.GFile("output_labels.txt")]

    with tf.gfile.FastGFile("output_graph.pb", 'rb') as f:
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(f.read())
        _ = tf.import_graph_def(graph_def, name='')

    with tf.Session() as sess:
        # Feed the image data as input to the graph an get first prediction
        softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')
        predictions = sess.run(softmax_tensor, \
                               {'DecodeJpeg/contents:0': image_data})
        # Sort to show labels of first prediction in order of confidence
        top_k = predictions[0].argsort()[-len(predictions[0]):][::-1]

        low_confidence = 0
        for node_id in top_k:
            human_string = label_lines[node_id]
            score = predictions[0][node_id]
            # print('%s (score = %.2f)' % (human_string, score))
            if score < 0.90:
                low_confidence += 1
            result[human_string] = str(score)

        if low_confidence >= 2:
            result['error'] = 'Unable to classify document type (Passport/Driving License)'

return jsonify(result)


if __name__ == '__main__':
    app.run(debug=True)

1 个答案:

答案 0 :(得分:0)

我有同样的问题,但是我的代码是这样的:

class Classifer
  def __init__(self):
    self.sess = tf.Session()

  def predict(self, image):
    self.sess.run(image)

我不能说带咖啡的烧瓶是否有同样的问题。

我使用 gunicorn 解决了问题:

gunicorn \
  --reuse-port \
  -b ${host}:${port} \
  --reload  \
  --log-level debug \
  --workers 2 \
  --max-requests 10000 \
  --max-requests-jitter 200 \

请注意, max-requests 会将服务器设置为在10000次调用后重新启动。