我正在尝试使用python中的Spark Streaming使用KinesisUtils软件包从Amazon Kinesis Data Stream读取数据,但出现错误。
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kinesis import KinesisUtils, InitialPositionInStream
sc = SparkContext.getOrCreate()
ssc = StreamingContext(sc, 1)
#APPNAME,STREAMNAME,REGIONNAME,ENDPOINTURL,CHECKPOINT INTERVAL ARE CONSTANTS DEFINED HERE
kinesisStream = KinesisUtils.createStream(
ssc, APPNAME, STREAMNAME, ENDPOINTURL,
REGIONNAME, InitialPositionInStream.TRIM_HORIZON, CHECKPOINTINTERVAL, StorageLevel.MEMORY_AND_DISK_2)
kinesisStream.pprint()
ssc.start()
ssc.awaitTermination()
我使用以下命令在EMR上运行此命令
spark-submit --deploy-mode cluster --jars s3://bucket/spark-streaming-kinesis-asl_2.11-2.4.3.jar s3PathToMainPyFile
出现以下错误,
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/mnt/yarn/usercache/hadoop/appcache/application_1565898995408_0003/container_1565898995408_0003_01_00001/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 985, in send_command
response = connection.send_command(command)
File "/mnt/yarn/usercache/hadoop/appcache/application_1565898995408_0003/container_1565898995409_0003_01_00001/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1164, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
Py4JNetworkError: Error while receiving
Traceback (most recent call last):
File "spark_sentiment2.py", line 24, in <module>
REGIONNAME, InitialPositionInStream.TRIM_HORIZON, CHECKPOINTINTERVAL, StorageLevel.MEMORY_AND_DISK_2)
File "/mnt/yarn/usercache/hadoop/appcache/application_1565898995408_0003/container_1565898995408_0003_01_00001/pyspark.zip/pyspark/streaming/kinesis.py", line 92, in createStream
File "/mnt/yarn/usercache/hadoop/appcache/application_1565898995408_0003/container_1565898995408_0003_01_00001/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/mnt/yarn/usercache/hadoop/appcache/application_1565898995408_0003/container_15658989954080003_01_00001/py4j-0.10.7-src.zip/py4j/protocol.py", line 336, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o66.createStream