具有慢速查询,多个客户端用户和托管在Kubernetes上的Flask应用程序

时间:2017-06-12 17:12:28

标签: python flask kubernetes

我有一个Flask应用程序,我希望在其中完成以下任务:

  1. 拥有一个将运行一系列查询的端点
  2. 此端点需要在有限的秒数内响应HTTP请求。
  3. 查询可能需要几分钟才能完成,所以我需要它们在一个单独的线程中运行,多个客户端经常轮询服务器以查看是否有新的数据要返回给他们
  4. 希望在Kubernetes上托管多个pod实例。
  5. 我的下面的实现有几个问题:

    1. 民意调查终点似乎不大,大部分只是处理查询队列,并确保每个客户都得到自己的结果,而不是别人。
    2. 不确定发生了什么,但是当我尝试在Kubernetes上托管这个pod的多个实例时,就像一些用户的某些轮询请求被发送到其uuid不存在的实例。
    3. 我希望能够理解我在线程和队列方面做错了什么,因为这似乎是一种做到这一点的hacky方式。此外,如何让所有Kubernetes实例都可以使用这些查询的结果?

      谢谢!

      from flask import Flask, render_template, request, jsonify, g
      from Queue import Queue
      from threading import Thread
      from time import sleep
      
      
      app = Flask(__name__, template_folder='Templates')
      
      
      @app.route('/')
      def index():
          return render_template('index.html')
      
      
      @app.before_first_request
      def before_first_request():
          g.output = Queue()
          g.data_results = {}
          return ""
      
      
      @app.route('/data')
      def data():
          """
          Endpoint hit to fire of a request for data from a given user (uuid)
          """
          params = request.args.to_dict()
          uuid = params['uuid']
          # Create a list for this user, to store their results
          g.data_results[uuid] = [] 
          list_of_queries = ["SELECT * FROM tbl1;", 
                             "SELECT * FROM tbl2;", 
                             "SELECT * FROM tbl3;"]
          for query in list_of_queries:
               t = Thread(target=worker, args=(query, uuid, g.output))
               t.daemon = True
               t.start()
          return jsonify({'msg':'Queries started'})
      
      
      def worker(*args):
          query, uuid, output = args
          # Will actually be something like `result = run_query(query)`
          result = {'uuid':uuid}
          sleep(10)
          output.put(result)
      
      
      @app.route('/poll')
      def poll():
          """
          Endpoint hit ever x seconds from frontend
          to see if the data is ready
          """
          params = request.args.to_dict()
          uuid_from_client = params['uuid']
          # If client polls for result, but server has no record of this uuid
          # This can happen in kubernetes with multiple instances running
          if g.data_results.get(uuid_from_client) is None:
              return jsonify({'msg':'pong', 'data':None, 'freshdata':None})
          try:
              output = g.output
              # This line throws an error if there is nothing to get
              results = output.get(False)
              output.task_done()
              # What is the uuid associated with the most recently returned data
              # More than 1 chunk of data can be in here
              uuid_from_data = results['uuid']
              g.data_results[uuid_from_data].append(results)
          except:
              uuid_from_data = None
              results = None
      
          results_for_client_uuid = g.data_results[uuid_from_client]
          if len(results_for_client_uuid) > 0:
              res = results_for_client_uuid.pop(0)
          else:
              res = None
          return jsonify({'msg':'pong', 'data':res})
      
      
      if __name__ == "__main__":
          with app.app_context():
              app.run(host='0.0.0.0')
      

1 个答案:

答案 0 :(得分:0)

设置您的应用程序架构以使用排队软件,以便在工作中分离关注点。

这是一篇很棒的文章,可以帮助您提供一些见解http://blog.gorgias.io/deploying-flask-celery-with-docker-and-kubernetes/ 还有一个https://endocode.com/blog/2015/03/24/using-googles-kubernetes-to-build-a-distributed-task-management-cluster/