我正在尝试首次尝试从Scala代码访问Glue Catalog。
在尝试使用Maven构建项目时,我已经遇到了麻烦(这对How to set up a local development environment for Scala Spark ETL to run in AWS Glue?很有帮助)
但是现在我试图在EMR集群中运行我的代码,并且遇到了java.lang.NoClassDefFoundError
这是我的代码:
import com.amazonaws.services.glue.util.JsonOptions
import com.amazonaws.services.glue.{DynamicFrame, DynamicRecord, GlueContext}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.SparkSession
import org.slf4j.LoggerFactory
import org.apache.spark.sql.functions.{col, month, year}
object JoinAndRelation {
private val logger = LoggerFactory.getLogger(getClass)
def main(sysArgs: Array[String]): Unit = {
//Spark session creation with connection to Glue Catalog
implicit val spark: SparkSession = SparkSession
.builder
.config(new SparkConf().setAppName("TestGlueAccess"))
.getOrCreate()
val sc: SparkContext = spark.sparkContext
val glueContext: GlueContext = new GlueContext(sc)
...
这是错误:
19/02/08 15:35:26 INFO Client:
client token: N/A
diagnostics: User class threw exception: java.lang.NoClassDefFoundError: com/amazonaws/services/glue/GlueContext
at org.sergio.poc.JoinAndRelation$.main(JoinAndRelation.scala:41)
at org.sergio.poc.JoinAndRelation.main(JoinAndRelation.scala)
我能够通过Maven添加 glue-assembly.jar 作为依赖进行编译,也尝试添加 aws-java-sdk-core ,但是它没用...
<dependency> <groupId>com.amazonaws</groupId> <artifactId>glue-assembly</artifactId> <version>1.0</version> <scope>system</scope> <systemPath>${project.basedir}/libs/glue-assembly.jar</systemPath> </dependency> <dependency> <groupId>com.amazonaws</groupId> <artifactId>aws-java-sdk-core</artifactId> <version>1.11.445</version> </dependency>
最后这是我用来运行它的命令:
spark-submit --class org.sergio.poc.JoinAndRelation --master yarn --deploy-mode群集--executor-内存2G --num-executors 2 MyFirstScalaMavenProject-1.0-SNAPSHOT.jar
有人遇到同样的问题吗?