问题:计算任务可以轻松并行。但它需要实时响应。
可以有两种方法。 1.使用Celery:从头开始并行运行 2.使用Spark:与spark框架并行运行作业
我认为在可扩展性方面,火花更好。但是,作为网络应用程序的后端,它是否可以使用Spark?
答案 0 :(得分:2)
除上述答案外,还有其他方面需要确定。
流式传输的选择可以帮助您更快地将数据导入群集。但它不能保证Web应用程序所需的响应时间。您需要查看HBase和Solr(如果您正在搜索)。
Spark无疑比其他批处理框架更好,更快。在流式传输中,可能还有其他一些。如上所述,您应该考虑选择的参数。
答案 1 :(得分:1)
Celery :- is really a good technology for distributed streaming And its supports Python language . Which is it self strong in computation and easy to write. The streaming application in Celery supports so many features as well . Its little over head on CPU.
Spark- Its supports various programming language Java,Scala,Python. its not pure streaming its micro batch streaming as per the Spark documentation
If your task can only be full filled by streaming and you dont need the SQl like feature . Then Celery will be the best. But you need various feature along with streaming then SPark will be better . In that case you can take scenario you application will generate the data in how many batches within second .