我尝试使用AvroParquetWriter将Avro格式的文件转换为镶木地板文件。我加载了架构
val schema:org.apache.Schema = ... getSchema(...)
val parquetFile = new Path("Location/for/parquetFile.txt")
val writer = new AvroParquetWriter[GenericRecord](parquetFile,schema)
我的代码运行正常,直到初始化AvroParquetWriter。然后它抛出了这个错误:
> java.lang.RuntimeException: java.io.FileNotFoundException:
> java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are
> unset. -see https://wiki.apache.org/hadoop/WindowsProblems at
> org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:722) at
> org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:256)
> at
> org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:273)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:767)
> at
> org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.<init>(RawLocalFileSystem.java:235)...etc
它似乎给出的建议以及我发现的建议与如何在您的计算机上运行Hadoop集群时解决此问题有关。但是,我没有运行Hadoop集群,也不是我的目标。我已经导入了一些库,用于我的SBT文件中我的程序的各种其他部分,但这不会启动本地集群。
它刚刚开始这样做。在我的另外两个同事中,一个人能够毫无问题地运行这个,而另一个刚刚开始遇到与我相同的问题。这是我的build.sbt的相关部分:
lazy val root = (project in file("."))
.settings(
commonSettings,
name := "My project",
version := "0.1",
libraryDependencies ++= Seq(
"org.apache.hadoop" % "hadoop-common" % "2.9.0",
"com.typesafe.akka" %% "akka-actor" % "2.5.2",
"com.lightbend.akka" %% "akka-stream-alpakka-s3" % "0.9",
"com.enragedginger" % "akka-quartz-scheduler_2.12" % "1.6.0-akka-2.4.x",
"com.typesafe.akka" % "akka-agent_2.12" % "2.5.2",
"com.typesafe.akka" % "akka-remote_2.12" % "2.5.2",
"com.typesafe.akka" % "akka-stream_2.12" % "2.5.2",
"org.apache.kafka" % "kafka-clients" % "0.10.2.1",
"com.typesafe.akka" %% "akka-stream-kafka" % "0.16",
"com.typesafe.akka" %% "akka-persistence" % "2.5.2",
"org.iq80.leveldb" % "leveldb" % "0.7",
"org.fusesource.leveldbjni" % "leveldbjni-all" % "1.8",
"javax.mail" % "javax.mail-api" % "1.5.6",
"com.sun.mail" % "javax.mail" % "1.5.6",
"commons-io" % "commons-io" % "2.5",
"org.apache.avro" % "avro" % "1.8.1",
"net.liftweb" % "lift-json_2.12" % "3.1.0-M1",
"com.google.code.gson" % "gson" % "2.8.1",
"org.json4s" %% "json4s-jackson" % "3.5.2",
"com.amazonaws" % "aws-java-sdk-s3" % "1.11.149",
//"com.amazonaws" % "aws-java-sdk" % "1.11.286",
"org.scalikejdbc" %% "scalikejdbc" % "3.0.0",
"org.scalikejdbc" %% "scalikejdbc-config" % "3.0.0",
"org.scalikejdbc" % "scalikejdbc-interpolation_2.12" % "3.0.2",
"com.microsoft.sqlserver" % "mssql-jdbc" % "6.1.0.jre8",
"org.apache.commons" % "commons-pool2" % "2.4.2",
"commons-pool" % "commons-pool" % "1.6",
"com.jcraft" % "jsch" % "0.1.54",
"ch.qos.logback" % "logback-classic" % "1.2.3",
"com.typesafe.scala-logging" %% "scala-logging" % "3.7.2",
"org.scalactic" %% "scalactic" % "3.0.4",
"mysql" % "mysql-connector-java" % "8.0.8-dmr",
"org.scalatest" %% "scalatest" % "3.0.4" % "test"
)
)
关于为何无法运行与Hadoop相关的依赖关系的任何想法?
答案 0 :(得分:0)
答案是遵循他们的建议 -
我从中下载了最新版本的winutils.exe https://github.com/steveloughran/winutils/tree/master/hadoop-3.0.0/bin
然后我在C:/Users/MyName/Hadoop/bin
中手动创建了这个目录结构 - 注意,bin
必须在那里。你可以随意调用Hadoop
/目录,但bin/
必须是一个级别。
我放置了winutils.exe并将其放入垃圾箱。
在我的代码中,我必须将这一行放在初始化镶木地板编写器上面(我想它可以在初始化之前的任何时候)来设置Hadoop线:
-
System.setProperty("hadoop.home.dir", "C:/Users/nhanak/Hadoop/")
val writer = new AvroParquetWriter[GenericRecord](parquetFile,iInfo.schema)
src/main/resources/HadoopResources/bin
,将winutils.exe放在bin中。然后,要使用winutils.exe,您需要像这样设置Hadoop主页:-
val file = new File("src/main/resources/HadoopResources")
System.setProperty("hadoop.home.dir", file.getAbsolutePath)