如何在Zeppelin / EMR上跟踪org.apache.thrift.transport.TTransportException的原因

时间:2017-12-14 01:45:27

标签: apache-spark yarn amazon-emr apache-zeppelin apache-spark-ml

我正在努力让我的代码在EMR上运行Zeppelin(emr-5.10.0,Zeppelin 0.7.3,Spark 2.2.0)。

代码很简单,在{400}个样本的训练数据帧(约40K正数和360K负数)上拟合CrossValidatorRandomForestClassifier

当我进行简单的训练时(比如100个最大深度为15的树),一切顺利,但当我在ParamGridBuilder中使用更多值进行更重的测试时,我得到了org.apache.thrift.transport.TTransportException我做了不知道如何追查那个错误的原因。

我正在使用三台c3.8xlarge机器的集群,在Zeppelin上使用以下Spark解释器设置:

spark.executor.memory = 15g
spark.yarn.executor.memoryOverhead = 2048
spark.executor.cores = 10

我与spark.memory.fraction一起玩没有成功,我也尝试通过上面的三个设置来改变执行者的数量,但没有成功。

我觉得这是一个齐柏林飞艇的问题,但我无法追查异常的原因。我查看了日志而没有发现TTransportException以外的任何异常,这本身就没有用。

高度赞赏如何追踪异常反应的任何帮助或提示。

以下是代码:

import org.apache.spark.ml.Pipeline
import org.apache.spark.ml.classification.RandomForestClassifier
import org.apache.spark.ml.evaluation.BinaryClassificationEvaluator
import org.apache.spark.ml.tuning.{CrossValidator, ParamGridBuilder}
import org.apache.spark.ml.feature.{OneHotEncoder, StringIndexer, VectorAssembler}
import org.apache.spark.ml.linalg.Vectors

val genreIndexer = new StringIndexer()
                        .setInputCol("genre")
                        .setOutputCol("genreIndex")
                        .setHandleInvalid("skip")

val genreEncoder = new OneHotEncoder()
                        .setInputCol(genreIndexer.getOutputCol)
                        .setOutputCol("genreVec")

val featuresAssembler = new VectorAssembler()
                            .setInputCols(Array("hourOfDay", "dayOfWeek_number", "dayOfMonth", "genreVec"))
                            .setOutputCol("features")

val classifier = new RandomForestClassifier()
                    .setLabelCol("label")
                    .setFeaturesCol("features")

val paramGrid = new ParamGridBuilder()
                   .addGrid(classifier.numTrees, Array(200, 400))
                   .addGrid(classifier.maxDepth, Array(10, 20))
                   .build()

val pipeline = new Pipeline().setStages(Array(genreIndexer, genreEncoder, featuresAssembler, classifier))

val cv = new CrossValidator()
              .setEstimator(pipeline)
              .setEvaluator(new BinaryClassificationEvaluator)
              .setEstimatorParamMaps(paramGrid)
              .setNumFolds(3)

val cvModel = cv.fit(train_df)

以下是我在日志和Zeppelin中看到的异常:

org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_interpret(RemoteInterpreterService.java:266)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.interpret(RemoteInterpreterService.java:250)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.interpret(RemoteInterpreter.java:373)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:97)
at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:406)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

0 个答案:

没有答案