关于spark job的执行者的java.lang.NoClassDefFoundError

时间:2017-07-31 18:20:36

标签: scala hadoop apache-spark apache-spark-sql amazon-dynamodb

我正在尝试通过spark作业为hive表中的每条记录写入dynamodb。详细错误是

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 12 in stage 2.0 failed 4 times, most recent failure: Lost task 12.3 in stage 2.0 (TID 775, ip-10-0-0-xx.eu-west-1.compute.internal, executor 1): java.lang.NoClassDefFoundError: Could not initialize class com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder

代码段如下:

object ObjName {

    def main(args: Array[String]): Unit = {
      print(classOf[com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder].getProtectionDomain().getCodeSource().getLocation().toURI().getPath())

      val session = SparkSession.builder()
        .appName("app_name")
        .enableHiveSupport()
        .getOrCreate()
      import session.implicits._
      session.sparkContext.setLogLevel("WARN")

      session.sql("""
            select
                email,
                name
            from db.tbl
            """).rdd.repartition(40)
        .foreachPartition( iter => {
          val random = new Random();
          val client = AmazonDynamoDBClientBuilder.standard.withRegion(Regions.EU_WEST_1).withCredentials(new AWSStaticCredentialsProvider(new BasicAWSCredentials("access key", "secret key"))).build()
          val dynamoDB = new DynamoDB(client)
          val table = dynamoDB.getTable("table_name")
          iter.foreach(row => {
            val item = new Item().withPrimaryKey("email", row.getString(0)).withNumber("ts", (System.currentTimeMillis)*1000+random.nextInt(999+1)).withString("name", row.getString(1))
            table.putItem(item)
          })
        })
      }
}

maven依赖项:

<dependencies>
    <!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-dynamodb -->
    <dependency>
        <groupId>com.amazonaws</groupId>
        <artifactId>aws-java-sdk-dynamodb</artifactId>
        <version>1.11.170</version>
    </dependency>
    <dependency>
        <groupId>com.amazonaws</groupId>
        <artifactId>aws-java-sdk-core</artifactId>
        <version>1.11.170</version>
    </dependency>
    <dependency>
        <groupId>com.amazonaws</groupId>
        <artifactId>aws-java-sdk-s3</artifactId>
        <version>1.11.170</version>
    </dependency>

    <dependency>
        <groupId>com.amazonaws</groupId>
        <artifactId>jmespath-java</artifactId>
        <version>1.11.170</version>
    </dependency>

    <dependency>
        <groupId>org.apache.httpcomponents</groupId>
        <artifactId>httpclient</artifactId>
        <version>4.5.2</version>
    </dependency>
    <dependency>
        <groupId>org.apache.httpcomponents</groupId>
        <artifactId>httpcore</artifactId>
        <version>4.4.4</version>
    </dependency>

    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11 -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.1.0</version>
        <scope>provided</scope>
    </dependency>

    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive_2.10 -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_2.11</artifactId>
        <version>2.1.0</version>
        <scope>provided</scope>
    </dependency>

    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.11 -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.1.0</version>
        <scope>provided</scope>
    </dependency>
</dependencies>

在main方法的开头,我打印了成功的类com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder的jar文件位置,这意味着这个类在驱动程序节点上加载得很好。

另外,我执行jar tvf package.jar | grep -i AmazonDynamoDBClientBuilder --color并确认此类在我的打包jar文件中。

提交spark作业的命令如下。无论是否添加--jars,都会抱怨上面显示的相同错误。有什么建议?感谢。

spark-submit --class MainClassName --jars /mnt/home/hadoop/aws-java-sdk-dynamodb-1.11.170.jar,/mnt/home/hadoop/aws-java-sdk-core-1.11.170.jar,/mnt/home/hadoop/aws-java-sdk-s3-1.11.170.jar,/mnt/home/hadoop/jmespath-java-1.11.170.jar --driver-memory 3G --num-executors 20 --executor-memory 4G --executor-cores 4 package.jar

0 个答案:

没有答案