可动态配置的ZMQ过滤器作为ETL管道

时间:2019-03-20 13:31:49

标签: python flask etl pyzmq

我有ZMQ消息流,我需要过滤正确的消息并将其保存在MongoDB中。诀窍是可以选择动态设置过滤条件。

我已经使用multiprocessingFlask提出了这个解决方案。基本上,我使用Flask HTTP API来更改filter_parameters给定的全局Manager字典,并与运行zmq过滤器本身的另一个进程共享。所以我的问题是-这是一个好的解决方案,这方面的最佳实践是什么?

from multiprocessing import Process, Manager
from flask import Flask, jsonify, request
from pymongo import MongoClient

app = Flask(__name__)
manager = Manager()
filter_parameters = manager.dict()
filter_parameters['addresses'] = ['']

def scan(q, filter_parameters):
    import zmq
    context = zmq.Context()
    socket = context.socket(zmq.SUB)
    socket.connect("tcp://localhost:5556")
    socket.setsockopt_unicode(zmq.SUBSCRIBE, 'tx')
    client = MongoClient() # localhost
    db = client.transactions_db
    tx_collection = db.tx_collection
    while True:
        recv_str = socket.recv_string()
        recv_str = recv_str.split()
        filter(recv_str, filter_parameters, tx_collection)

def filter(recv_str, filter_parameters, tx_collection):
    if recv_str[0] == 'tx':
        if recv_str[7] == '0':
            if recv_str[2] in filter_parameters['addresses']:
                tx_collection.insert_one({'tx':recv_str}).inserted_id

p = Process(target=scan, args=(filter_parameters))
p.daemon = True
p.start()

@app.route('/zmq_buffer/set_filter_addresses', methods=["POST"])
def set_filter_addresses():
    data_jsn = request.json
    filter_parameters['addresses'] = data_jsn['addresses']
    return jsonify({'addresses':filter_parameters['addresses']}), 200

@app.route('/zmq_buffer/get_filter_addresses', methods=["GET"])
def get_filter_addresses():
    if filter_parameters['addresses']:
        resp = jsonify(filter_parameters['addresses'])
    else:
        resp = 'None'
    return resp, 200

if __name__ == '__main__':
    app.run(port=7050)
    p.terminate()

0 个答案:

没有答案