我正在使用xsbt-proguard-plugin,这是一个与Proguard合作的SBT插件。
我正在尝试为我编写的Hive Deserializer提供一个Proguard配置,该配置具有以下依赖关系:
// project/Dependencies.scala
val hadoop = "org.apache.hadoop" % "hadoop-core" % V.hadoop
val hive = "org.apache.hive" % "hive-common" % V.hive
val serde = "org.apache.hive" % "hive-serde" % V.hive
val httpClient = "org.apache.httpcomponents" % "httpclient" % V.http
val logging = "commons-logging" % "commons-logging" % V.logging
val specs2 = "org.specs2" %% "specs2" % V.specs2 % "test"
加上非托管依赖:
// lib/UserAgentUtils-1.6.jar
因为大多数这些要么用于本地单元测试,要么在Hadoop / Hive环境中可用,我希望我的缩小jarfile只包括:
org.apache.httpcomponents.httpclient
commons-logging
lib/UserAgentUtils-1.6.jar
但我真的很难让语法正确。我应该从我想要保留的类的白名单开始,还是明确过滤掉Hadoop / Hive / Serde / Specs2库?我知道this SO question但它似乎不适用于此。
如果我最初尝试白名单方法:
// Should be equivalent to sbt> package
import ProguardPlugin._
lazy val proguard = proguardSettings ++ Seq(
proguardLibraryJars := Nil,
proguardOptions := Seq(
"-keepattributes *Annotation*,EnclosingMethod",
"-dontskipnonpubliclibraryclassmembers",
"-dontoptimize",
"-dontshrink",
"-keep class com.snowplowanalytics.snowplow.hadoop.hive.SnowPlowEventDeserializer",
"-keep class com.snowplowanalytics.snowplow.hadoop.hive.SnowPlowEventStruct"
)
)
然后我得到一个Hadoop处理错误,所以显然Proguard仍在尝试捆绑Hadoop:
proguard: java.lang.IllegalArgumentException: Can't find common super class of [[Lorg/apache/hadoop/fs/FileStatus;] and [[Lorg/apache/hadoop/fs/s3/Block;]
与此同时,如果我尝试Proguard's filtering syntax建立库的黑名单,我不想包括:
import ProguardPlugin._
lazy val proguard = proguardSettings ++ Seq(
proguardLibraryJars := Nil,
proguardOptions := Seq(
"-keepattributes *Annotation*,EnclosingMethod",
"-dontskipnonpubliclibraryclassmembers",
"-dontoptimize",
"-dontshrink",
"-injars !*hadoop*.jar"
)
)
然后这似乎也不起作用:
proguard: java.io.IOException: Can't read [/home/dev/snowplow-log-deserializers/!*hadoop*.jar] (No such file or directory)
任何帮助都非常感谢!
答案 0 :(得分:1)
白名单是正确的方法:ProGuard应该获得完整的上下文,因此它可以正确地摆脱不需要的类,字段和方法。
错误“无法找到常见的超级类”表明输入中仍然缺少某些库。 ProGuard可能会警告它,但配置似乎包含选项-ignorewarnings或-dontwarn(应该避免)。您应该使用-injars或-libraryjars添加库。
如果ProGuard包含了一些您在输出中没有预料到的类,您可以通过“-whyareyoukeeping class somepackage.SomeUnexpectedClass”获得解释。
从工作配置开始,您仍然可以尝试从输入中过滤出类或整个jar。过滤器被添加到类路径中的项目中,而不是它们自己,例如, “-injars some.jar(!somepackage / **。class)” - cfr。手册。如果输入包含拖入其他不需要的类的测试类,这将非常有用。
答案 1 :(得分:0)
最后,我无法使用Proguard来解决重复的类错误,更不用说如何过滤掉相关的jar了,所以最后切换到更清晰的sbt-assembly方法:
-1。根据{{3}}
将sbt-assembly插件添加到我的项目中 -2。使用"provided"
标志更新了相应的项目依赖项,以阻止它们被添加到我的胖罐中:
val hadoop = "org.apache.hadoop" % "hadoop-core" % V.hadoop % "provided"
val hive = "org.apache.hive" % "hive-common" % V.hive % "provided"
val serde = "org.apache.hive" % "hive-serde" % V.hive % "provided"
val httpClient = "org.apache.httpcomponents" % "httpclient" % V.http
val httpCore = "org.apache.httpcomponents" % "httpcore" % V.http
val logging = "commons-logging" % "commons-logging" % V.logging % "provided"
val specs2 = "org.specs2" %% "specs2" % V.specs2 % "test"
-3。添加了如下所示的sbt-assembly配置:
import sbtassembly.Plugin._
import AssemblyKeys._
lazy val sbtAssemblySettings = assemblySettings ++ Seq(
assembleArtifact in packageScala := false,
jarName in assembly <<= (name, version) { (name, version) => name + "-" + version + ".jar" },
mergeStrategy in assembly <<= (mergeStrategy in assembly) {
(old) => {
case "META-INF/NOTICE.txt" => MergeStrategy.discard
case "META-INF/LICENSE.txt" => MergeStrategy.discard
case x => old(x)
}
}
)
然后输入assembly
生成了一个“胖罐子”,里面只包含我需要的包,包括非托管依赖项,不包括Hadoop / Hive等。