运行简单的Spark应用程序EMR。
我正在使用
emr-5.11.0
Hadoop distribution:Amazon 2.7. 3
Spark Version:Spark 2.2.1
试图运行的火花代码:
from pyspark import SparkContext, SparkConf
if __name__ == "__main__":
conf = SparkConf().setAppName("WordCountApp")
sc = SparkContext(conf=conf)
tokenized = sc.textFile("s3://XXXXX/XXXXX/README.md").repartition(6).flatMap(lambda line: line.split(" "))
wordCounts = tokenized.map(lambda word: (word, 1)).reduceByKey(lambda v1,v2: v1+v2)
wordCounts.saveAsTextFile("s3://XXXXXX/XXXXX/word_count")
sc.stop()
提交代码
spark-submit --master yarn --deploy-mode cluster s3://code/wordcount.py
我非常感谢您的帮助。
这是我在日志中得到的:
18/07/08 18:13:16 INFO YarnRMClient: Registering the ApplicationMaster
Exception in thread "main" java.lang.NoSuchFieldError: DEFAULT_DECOMMISSIONING_TIMEOUT
at org.apache.spark.deploy.yarn.YarnAllocator.<init>(YarnAllocator.scala:79)
at org.apache.spark.deploy.yarn.YarnRMClient.register(YarnRMClient.scala:77)
at org.apache.spark.deploy.yarn.ApplicationMaster.registerAM(ApplicationMaster.scala:359)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:409)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:254)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:764)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:762)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
登录继续。.
18/07/08 18:13:19 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1047
18/07/08 18:13:19 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at coalesce at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0, 1))
18/07/08 18:13:19 INFO YarnClusterScheduler: Adding task set 0.0 with 2 tasks
18/07/08 18:13:20 WARN ApplicationMaster$AMEndpoint: Container allocator is not ready to request executors yet.
18/07/08 18:13:20 WARN ExecutorAllocationManager: Unable to reach the cluster manager to request 1 total executors!
18/07/08 18:13:21 WARN ApplicationMaster$AMEndpoint: Container allocator is not ready to request executors yet.
18/07/08 18:13:21 WARN ExecutorAllocationManager: Unable to reach the cluster manager to request 1 total executors!
18/07/08 18:13:22 WARN ApplicationMaster$AMEndpoint: Container allocator is not ready to request executors yet.
18/07/08 18:13:22 WARN ExecutorAllocationManager: Unable to reach the cluster manager to request 1 total executors!
18/07/08 18:13:23 WARN ApplicationMaster$AMEndpoint: Container allocator is not ready to request executors yet.