Question

我正在尝试将Spark与Kafka集成。我让kafka使用者拥有json数据。我想用Spark将kafka消费者纳入其中以进行处理。当我运行以下代码时，抛出错误。

bin\spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.0 test.py localhost:9092 maktest

我的test.py在下面

import sys
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
if __name__ == "__main__":
    sc = SparkContext(appName="PythonStreamingDirectKafkaWordCount")
    ssc = StreamingContext(sc, 2)
    brokers, topic = sys.argv[1:]
    kvs = KafkaUtils.createDirectStream(ssc,[topic],{"metadata.broker.list": brokers})
    lines = kvs.map(lambda x: x[1])
    print (lines)
    ssc.start()
    ssc.awaitTermination()

我收到以下错误

18/12/10 16:41:40 INFO VerifiableProperties: Verifying properties
18/12/10 16:41:40 INFO VerifiableProperties: Property group.id is overridden to
18/12/10 16:41:40 INFO VerifiableProperties: Property zookeeper.connect is overridden to
<pyspark.streaming.kafka.KafkaTransformedDStream object at 0x000002A6DA9FE6A0>
18/12/10 16:41:40 ERROR StreamingContext: Error starting the context, marking it as stopped
java.lang.IllegalArgumentException: requirement failed: No output operations registered, so nothing to execute
        at scala.Predef$.require(Predef.scala:224)


Traceback (most recent call last):
  File "C:/Users/maws/Desktop/spark-2.2.1-bin-hadoop2.7/test.py", line 12, in <module>
    ssc.start()

py4j.protocol.Py4JJavaError: An error occurred while calling o25.start.
: java.lang.IllegalArgumentException: requirement failed: No output operations registered, so nothing to execute

Answer 1

您没有使用受支持的Spark Streaming DStream output operation。

对于pyspark API，您应该使用：

20:19:54.169 [ForkJoinPool-1-worker-1] INFO  com.intuit.karate - [print] [
  [#document: null],
  [#document: null]
]

<S:Envelope xmlns:S="http://schemas.xmlsoap.org/soap/envelope/"> <S:Header/> <S:Body> <ns10:getPricePlanResponse> <ns10:pricePlanSummary> <ns5:descriptionFrench>Forfait Montre Affaires</ns5:descriptionFrench> <ns4:category/> <ns4:effectiveDate>2009-11-05</ns4:effectiveDate> <ns4:serviceDataSharingGroupList> <ns4:dataSharingGroupCode>CAD_DATA</ns4:dataSharingGroupCode> <ns4:contributingInd>true</ns4:contributingInd> </ns4:serviceDataSharingGroupList> <ns4:feature> <ns5:descriptionFrench>Service</ns5:descriptionFrench> <ns4:poolGroupId/> </ns4:feature> <ns4:recurringCharge>10.0</ns4:recurringCharge> <ns4:ppsStorageSize>0</ns4:ppsStorageSize> <ns4:includedService> <ns4:term>0</ns4:term> <ns4:brandId>1</ns4:brandId> <ns4:feature> <ns5:code>MBAPN</ns5:code> <ns4:type/> <ns4:additionalNumberRequiredInd>false</ns4:additionalNumberRequiredInd> </ns4:feature> <ns4:recurringCharge>0.0</ns4:recurringCharge> <ns4:callingCircleFeaturesInd>false</ns4:callingCircleFeaturesInd> </ns4:includedService> <ns4:includedService> <ns4:term>0</ns4:term> <ns4:brandId>1</ns4:brandId> <ns4:feature> <ns5:code>MBAPN</ns5:code> <ns4:type/> <ns4:additionalNumberRequiredInd>false</ns4:additionalNumberRequiredInd> </ns4:feature> <ns4:recurringCharge>0.0</ns4:recurringCharge> <ns4:callingCircleFeaturesInd>false</ns4:callingCircleFeaturesInd> </ns4:includedService> <ns4:availableTermInMonths>0</ns4:availableTermInMonths> </ns10:pricePlanSummary> </ns10:getPricePlanResponse> </S:Body> </S:Envelope>不能与pyspark一起使用，因此请确保在检查Scala或Java的其他Spark Streaming示例时，将其更改为pprint() saveAsTextFiles(prefix, [suffix]) saveAsObjectFiles(prefix, [suffix]) saveAsHadoopFiles(prefix, [suffix]) foreachRDD(func)

没有注册任何输出操作，因此在PySpark中无任何执行

1 个答案: