我正在尝试运行这个小的spark-xml示例,当我执行spark-submit时它会失败并出现异常。
示例REPO:https://github.com/punithmailme/spark-xml-new
命令: ./ dse spark-submit --class MainDriver /Users/praj3/Desktop/projects/spark/main/build/libs/main.jar
Caused by: javax.xml.stream.XMLStreamException: Trying to output second root, <n:Brand>
at com.ctc.wstx.sw.BaseStreamWriter.throwOutputError(BaseStreamWriter.java:1537) ~[woodstox-core-asl-4.4.1.jar:4.4.1]
at com.ctc.wstx.sw.BaseStreamWriter.throwOutputError(BaseStreamWriter.java:1544) ~[woodstox-core-asl-4.4.1.jar:4.4.1]
at com.ctc.wstx.sw.BaseStreamWriter.reportNwfStructure(BaseStreamWriter.java:1572) ~[woodstox-core-asl-4.4.1.jar:4.4.1]
at com.ctc.wstx.sw.BaseNsStreamWriter.checkStartElement(BaseNsStreamWriter.java:469) ~[woodstox-core-asl-4.4.1.jar:4.4.1]
at com.ctc.wstx.sw.BaseNsStreamWriter.writeStartElement(BaseNsStreamWriter.java:290) ~[woodstox-core-asl-4.4.1.jar:4.4.1]
at com.sun.xml.internal.txw2.output.DelegatingXMLStreamWriter.writeStartElement(DelegatingXMLStreamWriter.java:45) ~[na:1.8.0_144]
at com.sun.xml.internal.txw2.output.IndentingXMLStreamWriter.writeStartElement(IndentingXMLStreamWriter.java:148) ~[na:1.8.0_144]
at com.databricks.spark.xml.parsers.StaxXmlGenerator$.apply(StaxXmlGenerator.scala:128) ~[main.jar:na]
at com.databricks.spark.xml.util.XmlFile$$anonfun$1$$anon$1.next(XmlFile.scala:108) ~[main.jar:na]
at com.databricks.spark.xml.util.XmlFile$$anonfun$1$$anon$1.next(XmlFile.scala:96) ~[main.jar:na]
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410) ~[scala-library-2.11.11.jar:na]
异常..
[group: 'org.apache.spark', name: 'spark-sql_2.11', version: '2.1.2'],
[group: 'org.projectlombok', name: 'lombok', version: lombokVersion],
[group: 'com.datastax.spark', name: 'spark-cassandra-connector_2.11', version: '2.0.7'],
[group: 'com.fasterxml.jackson.module', name: 'jackson-module-scala_2.11', version: '2.9.5'],
[group: 'com.fasterxml.jackson.core', name: 'jackson-core', version: '2.9.5'],
[group: 'com.fasterxml.jackson.core', name: 'jackson-databind', version: '2.9.5'],
)
group: 'org.apache.spark', name: 'spark-core_2.11', version: '2.1.2') {
exclude group: 'org.slf4j', module: 'slf4j-log4j12' //because of log4j and slf conflict
}
[group: 'com.databricks', name: 'spark-csv_2.11', version: '1.5.0'],
[group: 'com.databricks', name: 'spark-xml_2.11', version: '0.4.1']
)
环境和Dependecny Mac上的DataStax Enterprise 5.1.8具有以下依赖性
DECLARE TOTAL FLOAT;
DECLARE QTY FLOAT;
DECLARE UNITCOST FLOAT;
DECLARE OPERATIONCOST FLOAT;
DECLARE TOOLINGCOST FLOAT;
SELECT QUOTATIONPRT.QTY,QUOTATIONPRT.UNITPRICE,QUOTATIONPRT.OPERATIONCOST,QUOTATIONPRT.TOOLINGCOST INTO QTY,UNITCOST,OPERATIONCOST,TOOLINGCOST FROM QUOTATIONPRT WHERE QUOTATIONPRTID=OLD.QUOTATIONPRTID;
SET TOTAL=(QTY*UNITCOST)+OPERATIONCOST+TOOLINGCOST;
UPDATE QUOTATIONPRT SET QUOTATIONPRT.TOTAL=TOTAL WHERE QUOTATIONPRT.QUOTATIONPRTID=OLD.QUOTATIONPRTID;
END
DSE 5.1.8组件
当我将此作为主要方法作为单个线程运行时,它可以工作,仅在spark-submit上它不起作用!!!
答案 0 :(得分:0)
我已经在Yarn Cluster上尝试了示例代码,并且在带有S3的AWS EMR上运行正常。
https://github.com/mkanchwala/spark-databricks-example
请尝试一下,让我知道。