javax.xml.stream.XMLStreamException:尝试输出第二个根Spark-XML Spark程序

时间:2018-05-02 09:49:08

标签: apache-spark apache-spark-sql datastax-enterprise apache-spark-xml

我正在尝试运行这个小的spark-xml示例,当我执行spark-submit时它会失败并出现异常。

示例REPO:https://github.com/punithmailme/spark-xml-new

命令: ./ dse spark-submit --class MainDriver /Users/praj3/Desktop/projects/spark/main/build/libs/main.jar

Caused by: javax.xml.stream.XMLStreamException: Trying to output second root, <n:Brand>
    at com.ctc.wstx.sw.BaseStreamWriter.throwOutputError(BaseStreamWriter.java:1537) ~[woodstox-core-asl-4.4.1.jar:4.4.1]
    at com.ctc.wstx.sw.BaseStreamWriter.throwOutputError(BaseStreamWriter.java:1544) ~[woodstox-core-asl-4.4.1.jar:4.4.1]
    at com.ctc.wstx.sw.BaseStreamWriter.reportNwfStructure(BaseStreamWriter.java:1572) ~[woodstox-core-asl-4.4.1.jar:4.4.1]
    at com.ctc.wstx.sw.BaseNsStreamWriter.checkStartElement(BaseNsStreamWriter.java:469) ~[woodstox-core-asl-4.4.1.jar:4.4.1]
    at com.ctc.wstx.sw.BaseNsStreamWriter.writeStartElement(BaseNsStreamWriter.java:290) ~[woodstox-core-asl-4.4.1.jar:4.4.1]
    at com.sun.xml.internal.txw2.output.DelegatingXMLStreamWriter.writeStartElement(DelegatingXMLStreamWriter.java:45) ~[na:1.8.0_144]
    at com.sun.xml.internal.txw2.output.IndentingXMLStreamWriter.writeStartElement(IndentingXMLStreamWriter.java:148) ~[na:1.8.0_144]
    at com.databricks.spark.xml.parsers.StaxXmlGenerator$.apply(StaxXmlGenerator.scala:128) ~[main.jar:na]
    at com.databricks.spark.xml.util.XmlFile$$anonfun$1$$anon$1.next(XmlFile.scala:108) ~[main.jar:na]
    at com.databricks.spark.xml.util.XmlFile$$anonfun$1$$anon$1.next(XmlFile.scala:96) ~[main.jar:na]
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) ~[scala-library-2.11.11.jar:na]

异常..

[group: 'org.apache.spark', name: 'spark-sql_2.11', version: '2.1.2'],
            [group: 'org.projectlombok', name: 'lombok', version: lombokVersion],
            [group: 'com.datastax.spark', name: 'spark-cassandra-connector_2.11', version: '2.0.7'],
            [group: 'com.fasterxml.jackson.module', name: 'jackson-module-scala_2.11', version: '2.9.5'],
            [group: 'com.fasterxml.jackson.core', name: 'jackson-core', version: '2.9.5'],
            [group: 'com.fasterxml.jackson.core', name: 'jackson-databind', version: '2.9.5'],
    )

    group: 'org.apache.spark', name: 'spark-core_2.11', version: '2.1.2') {
        exclude group: 'org.slf4j', module: 'slf4j-log4j12' //because of log4j and slf conflict
    }


            [group: 'com.databricks', name: 'spark-csv_2.11', version: '1.5.0'],
            [group: 'com.databricks', name: 'spark-xml_2.11', version: '0.4.1']
    )

环境和Dependecny Mac上的DataStax Enterprise 5.1.8具有以下依赖性

DECLARE TOTAL FLOAT;
DECLARE QTY FLOAT; 
DECLARE UNITCOST FLOAT;
DECLARE OPERATIONCOST FLOAT; 
DECLARE TOOLINGCOST FLOAT; 

SELECT QUOTATIONPRT.QTY,QUOTATIONPRT.UNITPRICE,QUOTATIONPRT.OPERATIONCOST,QUOTATIONPRT.TOOLINGCOST INTO QTY,UNITCOST,OPERATIONCOST,TOOLINGCOST FROM QUOTATIONPRT WHERE QUOTATIONPRTID=OLD.QUOTATIONPRTID;

SET TOTAL=(QTY*UNITCOST)+OPERATIONCOST+TOOLINGCOST;

UPDATE QUOTATIONPRT SET QUOTATIONPRT.TOTAL=TOTAL WHERE QUOTATIONPRT.QUOTATIONPRTID=OLD.QUOTATIONPRTID;

END 

DSE 5.1.8组件

  • Apache Cassandra™3.11.1.2261
  • Apache Solr™6.0.1.0.2224
  • Apache Spark™2.0.2.17
  • DSE Java Driver 1.2.6
  • Spark Jobserver 0.6.2.237

当我将此作为主要方法作为单个线程运行时,它可以工作,仅在spark-submit上它不起作用!!!

1 个答案:

答案 0 :(得分:0)

我已经在Yarn Cluster上尝试了示例代码,并且在带有S3的AWS EMR上运行正常。

https://github.com/mkanchwala/spark-databricks-example

请尝试一下,让我知道。