我编写了以下代码来查询solr并使用Spark和solrj对文档集执行一些操作:
SolrQuery sq = new SolrQuery();
sq.set(key, JobUtils.removeFrontEndQuotesWithBackSlash(queryParams.get(key).render()));
JavaRDD<SolrDocument> tempRDD = solrRDD.queryShardsBIL(sq,
paramsObj.get("splitField").render().replaceAll("\"", ""),
Integer.parseInt(paramsObj.get("splitsPerShard").render().replaceAll("\"", "")),
paramsObj.get("exportHandler").render().replaceAll("\"", ""));
combinedRDD = combinedRDD.union(tempRDD);
combinedRDD.mapToPair(new SolrJobMapper1(jobConfig))
.reduceByKey(new SolrJobReducer1(jobConfig))
.foreachPartition(new SolrJobPartitionIndexer1(JobUtils.removeFrontEndQuotes(paramsObj.get("zkHost").render()),
JobUtils.removeFrontEndQuotes(paramsObj.get("solrCollection").render()),
Boolean.parseBoolean(JobUtils.removeFrontEndQuotes(paramsObj.get("doCommit").render())),accum,JobUtils.removeFrontEndQuotes(paramsObj.get("uniqueIdField").render())));
当我在spark服务器上运行作业时,我收到以下错误:
java.io.InvalidClassException: org.apache.solr.client.solrj.SolrQuery; local class incompatible: stream classdesc serialVersionUID = -323500251212286545, local class serialVersionUID = -7606622609766730986
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:616)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1623)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:194)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
如果我从main方法本地运行它,它可以正常工作。我在两种环境中都使用相同的solrj-6.1.0。我在这里缺少什么?
答案 0 :(得分:0)
我认为SolrQuery
不能从Spark序列化。
如果您发布了SolrJobMapper1
类和SolrJobPartitionIndexer1
的来源,不确定您是如何实施Solr查询的,那会很有趣,但对于我使用的这类作品并强烈建议使用Lucidworks Spark/Solr Integration