用于Univocity CSV解析器setDelimiter方法的Spark java.lang.NoSuchMethodError

时间:2019-07-29 05:47:30

标签: apache-spark univocity

我正在尝试运行一个使用Univocity CSV解析器的Scala Spark作业,并且在升级以支持字符串定界符(相对于仅字符)后,在群集中运行我的jar时出现以下错误。在我的IDEA IDE中本地运行它会产生预期的结果,没有错误。

ERROR yarn.ApplicationMaster: User class threw exception: java.lang.NoSuchMethodError: com.univocity.parsers.csv.CsvFormat.setDelimiter(Ljava/lang/String;)V
java.lang.NoSuchMethodError: com.univocity.parsers.csv.CsvFormat.setDelimiter(Ljava/lang/String;)V

我尝试了以下方法: 通过检查依赖关系树,消除了所有冲突的唯一性解析器: mvn依赖项:tree -Dverbose -Dincludes = com.univocity:univocity-parsers 会产生:

[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ preval ---
[INFO] dataqa:preval:jar:1.0-SNAPSHOT
[INFO] \- com.univocity:univocity-parsers:jar:2.8.2:compile

我还尝试在运行spark作业时设置spark.executor.userClassPathFirst = true配置,但行为没有改变。

这是我pom.xml中的依赖项部分:

<dependency>
            <groupId>org.scala-lang</groupId>
            <artifactId>scala-library</artifactId>
            <version>2.11.12</version>
        </dependency>
        <!--
            Spark library. spark-core_2.xx must match the scala language version
        -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.0.0</version>
        </dependency>

        <!--
            Spark SQL library. spark-sql_2.xx must match the scala language version
        -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.0.0</version>
            <exclusions>
                <exclusion>  <!-- declare the exclusion here -->
                    <groupId>com.univocity</groupId>
                    <artifactId>univocity-parsers</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <!--
            Library to make REST API call
        -->
        <dependency>
            <groupId>com.typesafe.play</groupId>
            <artifactId>play-ahc-ws-standalone_2.11</artifactId>
            <version>2.0.0-M1</version>
        </dependency>


        <!--
            Parses delimited files
        -->
        <dependency>
            <groupId>com.univocity</groupId>
            <artifactId>univocity-parsers</artifactId>
            <version>2.8.2</version>
            <type>jar</type>
        </dependency>

        <!-- https://mvnrepository.com/artifact/com.googlecode.json-simple/json-simple -->
        <dependency>
            <groupId>com.googlecode.json-simple</groupId>
            <artifactId>json-simple</artifactId>
            <version>1.1.1</version>
        </dependency>

我想知道Spark是否具有覆盖我的版本的内置依赖性(2.8是第一个支持String参数的版本。以前,它仅支持字符)。

有什么见解吗?

2 个答案:

答案 0 :(得分:0)

有点晚了,但是如果可以选择使用--conf spark.driver.extraClassPathspark.executor.extraClassPath,请参阅我的回复here

答案 1 :(得分:0)

花费大量时间进行故障排除后,我找到了解决方案。我必须按照此处https://www.cloudera.com/documentation/enterprise/5-13-x/topics/spark_building.html#relocation

的说明使用Maven-shade-plugin

这是我必须添加到pom.xml中的maven-shade-plugin定义中的代码的相关部分:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <version>3.2.1</version>
    <executions>
        <execution>
            <phase>package</phase>
            <goals>
                <goal>shade</goal>
            </goals>
            <configuration>

                <!-- other non-revelant filters etc omitted for brevity -->

                <relocations>
                    <!-- used to make sure there are no conflicts between the univocity parser version used here and the one that is bundled with spark -->
                    <relocation>
                        <pattern>com.univocity.parsers</pattern>
                        <shadedPattern>com.shaded.parsers</shadedPattern>
                    </relocation>
                </relocations>
            </configuration>
        </execution>
    </executions>
</plugin>