我正在努力让Titan与Tinkerpop 3.0.1 Hadoop-Gremlin合作,遵循Titan文档here。我基本上正在从this下载页面下载titan-1.0.0-hadoop1。
我几乎完全遵循文档,唯一的区别是我使用的是HBase支持的图形,而不是文档中使用的Cassandra图形。我确信我的'titan-hbase-cluster.properties'文件没有错误,可以在没有任何问题的情况下读取/写入HBase。
因此,按照Titan文档,我在gremlin控制台中发出以下命令:
\,,,/
(o o)
-----oOOo-(3)-oOOo-----
plugin activated: aurelius.titan
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/src/titanTest/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/src/titanTest/lib/logback-classic-1.1.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
23:17:17 INFO org.apache.tinkerpop.gremlin.hadoop.structure.HadoopGraph - HADOOP_GREMLIN_LIBS is set to: /usr/src/titanTest/lib
plugin activated: tinkerpop.hadoop
plugin activated: tinkerpop.tinkergraph
gremlin> :load data/grateful-dead-titan-schema.groovy
==>true
==>true
gremlin> graph = TitanFactory.open('conf/titan-hbase-cluster.properties')
==>standardtitangraph[cassandrathrift:[127.0.0.1]]
gremlin> defineGratefulDeadSchema(graph)
==>null
gremlin> graph.close()
==>null
gremlin> hdfs.copyFromLocal('data/grateful-dead.kryo','data/grateful-dead.kryo')
23:22:46 WARN org.apache.hadoop.util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
==>null
gremlin> graph = GraphFactory.open('conf/hadoop-graph/hadoop-load.properties')
==>hadoopgraph[gryoinputformat->nulloutputformat]
gremlin> blvp = BulkLoaderVertexProgram.build().writeGraph('conf/titan-cassandra.properties').create(graph)
==>BulkLoaderVertexProgram[bulkLoader=IncrementalBulkLoader,vertexIdProperty=bulkLoader.vertex.id,userSuppliedIds=false,keepOriginalIds=true,batchSize=0]
gremlin> graph.compute(SparkGraphComputer).program(blvp).submit().get()
运行这些命令后,我收到以下警告和错误字符串
23:23:51 WARN org.apache.tinkerpop.gremlin.hadoop.process.computer.spark.SparkGraphComputer - class org.apache.hadoop.mapreduce.lib.output.NullOutputFormat does not implement PersistResultGraphAware and thus, persistence options are unknown -- assuming all options are possible
23:23:58 WARN org.apache.hadoop.io.compress.snappy.LoadSnappy - Snappy native library not loaded
23:23:58 ERROR org.apache.spark.executor.Executor - Exception in task 0.0 in stage 0.0 (TID 0)
java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:267)
at org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoRecordReader.seekToHeader(GryoRecordReader.java:82)
at org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoRecordReader.initialize(GryoRecordReader.java:74)
at org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat.createRecordReader(GryoInputFormat.java:39)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:133)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:107)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
23:23:58 WARN org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 0.0 (TID 0, localhost): java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:267)
at org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoRecordReader.seekToHeader(GryoRecordReader.java:82)
at org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoRecordReader.initialize(GryoRecordReader.java:74)
at org.apache.tinkerpop.gremlin.hadoop.structure.ioofasdsadsa dsadsadsa dsa dsa dsadsa.gryo.GryoInputFormat.createRecordReader(GryoInputFormat.java:39)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:133)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:107)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
23:23:58 ERROR org.apache.spark.scheduler.TaskSetManager - Task 0 in stage 0.0 failed 1 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent fai.java:82)
at org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoRecordReader.initialize(GryoRecordReader.java:74)
at org.apache.tinkerpop.gremlin.hadoop.structure.io.gryo.GryoInputFormat.createRecordReader(GryoInputFormat.java:39)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:133)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:107)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:280)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:247)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
同样令人感兴趣的是“WARN org.apache.hadoop.util.NativeCodeLoader - 无法为您的平台加载native-hadoop库...使用内置的java类(如果适用)”复制到hdfs时出错。有没有人对如何进行有任何建议?我很失落,因为我只是想尽可能地遵循文档并在很早的时候遇到路障。
答案 0 :(得分:0)
你有一个EOFException
,这意味着包含你的数据的文件出了问题。如GryoRecordReader的实现所示,代码似乎没问题。
我的建议是检查您的文件是否正常,可读且包含utf-8字符。