在我的spark-job / code中传递外部yml文件不起作用,抛出“无法为tag:yaml.org,2002构建Java对象”

时间:2019-09-16 15:59:37

标签: yaml databricks snakeyaml typesafe typesafe-config

我正在使用spark 2.4.1版本和java8。我正在尝试使用spark-submit提交我的spark作业时加载外部属性文件。

我在TypeSafe下方使用它来加载我的属性文件。

<groupId>com.typesafe</groupId>
<artifactId>config</artifactId>
<version>1.3.1</version>

在我的Spark驱动程序类 MyDriver.java 中,我正在按以下方式加载YML文件

String ymlFilename = args[1].toString();
Optional<QueryEntities>  entities =  InputYamlProcessor.process(ymlFilename);
  

我这里有所有代码,包括InputYamlProcessor.java

https://gist.github.com/BdLearnerr/e4c47c5f1dded951b18844b278ea3441

这在我的本地环境中工作正常,但是当我在群集上运行时会出现错误

  

错误:

Can't construct a java object for tag:yaml.org,2002:com.snp.yml.QueryEntities; exception=Class not found: com.snp.yml.QueryEntities
 in 'reader', line 1, column 1:
    entities:
    ^

        at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:345)
        at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:127)
        at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:450)
        at org.yaml.snakeyaml.Yaml.loadAs(Yaml.java:444)
        at com.snp.yml.InputYamlProcessor.process(InputYamlProcessor.java:62)
Caused by: org.yaml.snakeyaml.error.YAMLException: Class not found: com.snp.yml.QueryEntities
        at org.yaml.snakeyaml.constructor.Constructor.getClassForNode(Constructor.java:650)
        at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.getConstructor(Constructor.java:331)
        at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:341)
        ... 12 more
  

我的火花作业脚本是

 $SPARK_HOME/bin/spark-submit \
    --master yarn \
    --deploy-mode cluster \
    --name MyDriver  \
    --jars "/local/jars/*.jar" \
    --files hdfs://files/application-cloud-dev.properties,hdfs://files/column_family_condition.yml \
    --class com.sp.MyDriver \
    --executor-cores 3 \
    --executor-memory 9g \
    --num-executors 5 \
    --driver-cores 2 \
    --driver-memory 4g \
    --driver-java-options -Dconfig.file=./application-cloud-dev.properties \
    --conf spark.executor.extraJavaOptions=-Dconfig.file=./application-cloud-dev.properties \
    --conf spark.driver.extraClassPath=. \
    --driver-class-path . \
     ca-datamigration-0.0.1.jar application-cloud-dev.properties column_family_condition.yml

我在这里做错了什么?如何解决这个问题? 任何修复都非常感谢。

已测试:

我在班级上方的那一行之前打印了类似的内容,以检查问题是否真的没有出现。

public static void printTest() {
    QueryEntity e1 = new QueryEntity();
    e1.setTableName("tab1");
    List<QueryEntity> li = new ArrayList<QueryEntity>();
    li.add(e1);


    QueryEntities ll = new QueryEntities();
    ll.setEntitiesList(li);

    ll.getEntitiesList().stream().forEach(e -> logger.error("e1 Name :" + e.getTableName()));


    return;
}

输出:

19/09/18 04:40:33 ERROR yml.InputYamlProcessor: e1 Name :tab1
    Can't construct a java object for tag:yaml.org,2002:com.snp.helpers.QueryEntities; exception=Class not found: com.snp.helpers.QueryEntities
             in 'reader', line 1, column 1:
                entitiesList:
         at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:345)

这是怎么了?

1 个答案:

答案 0 :(得分:1)

这与QueryEntities无关 即YAMLException:找不到类:com.snp.yml.QueryEntities

是YML构造函数问题

更改为

Yaml yaml = new Yaml(new  CustomClassLoaderConstructor(com.snp.helpers.QueryEntities.class.getClassLoader()));

来自

/*Constructor constructor = new Constructor(com.snp.helpers.QueryEntities.class);
        Yaml yaml = new Yaml( constructor );*/