火花流上下文的动态操作

时间:2017-01-04 00:40:15

标签: apache-spark pyspark spark-streaming

我正在运行一个独立的spark 2.0集群,它将运行许多小型作业。我想要做的是在DStream上运行多个操作(因为即使为每个作业分配1个核心也太高)。我也使用pySpark,我的代码看起来像这样:

# Stream context reading data from Apache Kafka.
stream_context = KafkaUtils.createDirectStream(.....    

# persisting the streaming context.
...

# Iterator to run different jobs.
iterator = {1: {'filter': ['metrics_of_kind_A', 'metrics_of_kind_AA'}, 2: {'filter': ['metrics_of_kind_B']}}

# Multiple actions on the same data set.
for _, value in iterator.iteritems():
    filtered_context = stream_context.filter(lambda x: x['metric_type'] in value['filter'])
    filtered_context.pprint()

我有办法做到这一点吗?

0 个答案:

没有答案