我想在Google Cloud VM群集上运行Spark-job,在地图操作中我需要对弹性搜索进行查询。我的问题是Spark和Elastic Search在Guava库上存在冲突,因为Spark使用的是Guava 14和ES Guava 18.
我的问题是这个方法调用
com.google.common.util.concurrent.MoreExecutors.directExecutor()
,存在于番石榴18中,但不存在于番石榴14中。
更详细地说,我要做的工作如下:
input.map(record=>{
val client=openConnection()
val newdata=client.query(record.someInfo)
new record(newdata)
})
方法openConnection
如下所示
public static TransportClient openConnection(String ipAddress, int ipPort) throws UnknownHostException {
Settings settings = Settings.settingsBuilder().put("cluster.name", "elasticsearch").build();
TransportClient client = TransportClient.builder().settings(settings).build().
addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(ipAddress), ipPort));
return client;
}
我尝试使用着色强制ES通过在sbt文件中添加着色规则来使用Guava 18,如下所示:
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.6.0" % "provided"
libraryDependencies += "org.apache.spark" % "spark-graphx_2.10" % "1.6.0" % "provided" ,
libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "1.6.0" % "provided" ,
libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "1.6.0" % "
libraryDependencies += "org.elasticsearch" % "elasticsearch" % "2.2.0",
assemblyShadeRules in assembly :=Seq(
ShadeRule.rename("com.google.common.*" -> "googlecommona.@1").
inLibrary("org.elasticsearch" % "elasticsearch" % "2.2.0"))
但问题似乎仍然存在。 有没有办法解决这个问题?
答案 0 :(得分:1)
您不能从提供的依赖项中排除过渡依赖项。
通过标记它provided
,你告诉打包者不要把它放在最后的jar中,因为你假设它已经在它将被部署的类路径上。
因此,您不能排除过渡依赖项,因为已排除整个依赖项本身。
您的选择是:
sbt-assembly
SBT插件中使用。着色是类的重命名。您知道ElasticSearch和Spark都有自己的Guava版本,它们也将自己作为依赖项包含在内。因此,您指示SBT将ES提供的Guava重命名为Guava1,将Spark提供的Guava重命名为Guava2,将Spark中的每个引用重命名为Guava1,将ES重命名为Guava2。您可以阅读sbt-assembly着色here。
答案 1 :(得分:1)
着色就是答案:我在build.sbt
文件中添加了以下规则。
以下解决方案适用于使用ElasticSearch TransportClient
类的YARN上的SPARK群集。
assemblyShadeRules in assembly :=Seq(
ShadeRule.rename("com.google.**" -> "googlecommona.@1").inAll
)
我附上了整个sbt文件的完整性:
import sbt.ExclusionRule
import sbt.Keys._
lazy val root = (project in file(".")).
settings(
name := "scala_code",
version := "1.0",
scalaVersion := "2.10.6",
conflictManager := ConflictManager.latestRevision,
test in assembly := {},
assemblyMergeStrategy in assembly := {
case PathList("META-INF", "MANIFEST.MF") => MergeStrategy.discard
case _ => MergeStrategy.first
},
parallelExecution in test := false,
libraryDependencies += "com.fasterxml.jackson.module" % "jackson-module-scala_2.10" % "2.6.5",
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.6.0" % "provided" exclude("javax.servlet", "servlet-api"),
libraryDependencies += "org.wikidata.wdtk" % "wdtk-datamodel" % "0.6.0" exclude ("com.fasterxml.jackson.core", "jackson-annotations"),
libraryDependencies += "org.apache.spark" % "spark-graphx_2.10" % "1.6.0" % "provided" ,
libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "1.6.0" % "provided" ,
libraryDependencies += "org.apache.spark" % "spark-streaming_2.10" % "1.6.0" % "provided",
libraryDependencies += "org.scalatest" % "scalatest_2.10" % "2.0" % "test",
+= "com.typesafe" % "config" % "1.2.1",
libraryDependencies += "org.jsoup" % "jsoup" % "1.8.3",
libraryDependencies += "org.elasticsearch" % "elasticsearch" % "2.2.0",// exclude("com.google.guava", "guava"),
assemblyShadeRules in assembly :=Seq(
ShadeRule.rename("com.google.**" -> "googlecommona.@1").inAll
)
)