Spark HDF5(特别是h5spark)java.io.IOException:意外的异常类型

时间:2018-12-26 23:07:57

标签: scala apache-spark hdf5

问题:
尝试使用H5spark java.io.IOException: unexpected exception type在Spark中读取HDF5文件时。

设置:
我在16个节点的群集上运行,但仅使用3个节点。当提到输出时, hec-16 是主设备,而 hec-15 hec-14 是从设备。

所有HDF5库均已正确使用和加载。

我正在读取的文件不驻留在并行或分布式文件系统上。该文件位于安装在主服务器和从服务器上的驱动器上,并且跨节点具有相同的文件路径。

尝试的解决方案:

  1. 为验证能否从已安装的驱动器读取spark,我读取了一个文本文件,该文件成功运行。

  2. 我已验证文件名相同

  3. 我已验证数据集名称相同

  4. 我故意尝试从无效的h5文件和数据集中读取数据,以查看是否发生了相同的错误。我遇到了另一个错误:java.lang.NoSuchMethodError

某些输出:

abuchan1@hec-16:~/programs/spark-2.3.1/bin$ ./spark-submit --master spark://hec-16:7077 --conf "spark.driver.extraClassPath=/mnt/common/abuchan1/programs/hdf/HDFJava-3.3.2-Linux/HDF_Group/HDFJava/3.3.2/lib/jarhdf5-3.3.2.jar" /mnt/common/abuchan1/code/scala/hdf5/target/scala-2.12/hdf5-smart-reader_2.12-1.0.jar
sparkTesting - Utils.scala:2068 = /mnt/common/abuchan1/code/scala/hdf5/target/scala-2.12/hdf5-smart-reader_2.12-1.0.jar
sparkTesting - Utils.scala:2133 = /mnt/common/abuchan1/programs/spark-2.3.1/conf/spark-defaults.conf
18/12/26 17:36:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
sparkTesting - SparkSubmit.scala:912 = /mnt/common/abuchan1/code/scala/hdf5/target/scala-2.12/hdf5-smart-reader_2.12-1.0.jar



java.library.path
/mnt/common/abuchan1/programs/hdf/HDFJava-3.3.2-Linux/HDF_Group/HDFJava/3.3.2/lib:/mnt/common/abuchan1/programs/protobuf-2.5.0/usr/local/lib:/mnt/common/abuchan1/programs/BerkeleyDB/lib:/mnt/common/abuchan1/orangefs/install/lib/:/opt/ohpc/pub/mpi/openmpi3-gnu7/3.0.0/lib:/opt/ohpc/pub/compiler/gcc/7.3.0/lib64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib



18/12/26 17:36:56 INFO SparkContext: Running Spark version 2.3.1
18/12/26 17:36:56 INFO SparkContext: Submitted application: SparkGrep
18/12/26 17:36:56 INFO SecurityManager: Changing view acls to: abuchan1
18/12/26 17:36:56 INFO SecurityManager: Changing modify acls to: abuchan1
18/12/26 17:36:56 INFO SecurityManager: Changing view acls groups to:
18/12/26 17:36:56 INFO SecurityManager: Changing modify acls groups to:
18/12/26 17:36:56 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(abuchan1); groups with view permissions: Set(); users  with modify permissions: Set(abuchan1); groups with modify permissions: Set()
18/12/26 17:36:56 INFO Utils: Successfully started service 'sparkDriver' on port 44199.
18/12/26 17:36:56 INFO SparkEnv: Registering MapOutputTracker
18/12/26 17:36:56 INFO SparkEnv: Registering BlockManagerMaster
18/12/26 17:36:56 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/12/26 17:36:56 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/12/26 17:36:56 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-1cdf3592-ecff-4aed-aa47-61a9d098c63e
18/12/26 17:36:57 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
18/12/26 17:36:57 INFO SparkEnv: Registering OutputCommitCoordinator
sparkTesting - Utils.scala:859 = /tmp
18/12/26 17:36:57 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/12/26 17:36:57 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://hec-16:4040
18/12/26 17:36:57 INFO SparkContext: Added JAR file:/mnt/common/abuchan1/code/scala/hdf5/target/scala-2.12/hdf5-smart-reader_2.12-1.0.jar at spark://hec-16:44199/jars/hdf5-smart-reader_2.12-1.0.jar with timestamp 1545863817473
18/12/26 17:36:57 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://hec-16:7077...
18/12/26 17:36:57 INFO TransportClientFactory: Successfully created connection to hec-16/192.168.1.17:7077 after 56 ms (0 ms spent in bootstraps)
18/12/26 17:36:57 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20181226173657-0003
18/12/26 17:36:57 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20181226173657-0003/0 on worker-20181226160143-192.168.1.15-36112 (192.168.1.15:36112) with 8 core(s)
18/12/26 17:36:57 INFO StandaloneSchedulerBackend: Granted executor ID app-20181226173657-0003/0 on hostPort 192.168.1.15:36112 with 8 core(s), 1024.0 MB RAM
18/12/26 17:36:57 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20181226173657-0003/1 on worker-20181226160127-192.168.1.16-36524 (192.168.1.16:36524) with 8 core(s)
18/12/26 17:36:57 INFO StandaloneSchedulerBackend: Granted executor ID app-20181226173657-0003/1 on hostPort 192.168.1.16:36524 with 8 core(s), 1024.0 MB RAM
18/12/26 17:36:57 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40816.
18/12/26 17:36:57 INFO NettyBlockTransferService: Server created on hec-16:40816
18/12/26 17:36:57 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/12/26 17:36:57 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, hec-16, 40816, None)
18/12/26 17:36:57 INFO BlockManagerMasterEndpoint: Registering block manager hec-16:40816 with 366.3 MB RAM, BlockManagerId(driver, hec-16, 40816, None)
18/12/26 17:36:57 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, hec-16, 40816, None)
18/12/26 17:36:57 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, hec-16, 40816, None)
18/12/26 17:36:57 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20181226173657-0003/0 is now RUNNING
18/12/26 17:36:57 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20181226173657-0003/1 is now RUNNING
18/12/26 17:36:58 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
18/12/26 17:36:58 INFO read$: Read Single file:/mnt/common/abuchan1/code/c/h5FileCreator/data/testData512.h5
18/12/26 17:36:58 INFO H5: HDF5 library: jhdf5
18/12/26 17:36:58 INFO H5:  successfully loaded from java.library.path
18/12/26 17:36:58 WARN ClosureCleaner: Expected a closure; got org.nersc.read$$$Lambda$19/1061913613
18/12/26 17:36:58 INFO SparkContext: Starting job: count at h5reader.scala:32
18/12/26 17:36:58 INFO DAGScheduler: Got job 0 (count at h5reader.scala:32) with 2 output partitions
18/12/26 17:36:58 INFO DAGScheduler: Final stage: ResultStage 0 (count at h5reader.scala:32)
18/12/26 17:36:58 INFO DAGScheduler: Parents of final stage: List()
18/12/26 17:36:58 INFO DAGScheduler: Missing parents: List()
18/12/26 17:36:58 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at flatMap at read.scala:253), which has no missing parents
18/12/26 17:36:59 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 3.3 KB, free 366.3 MB)
sparkTesting - DiskBlockManager.scala:68 = /tmp/blockmgr-1cdf3592-ecff-4aed-aa47-61a9d098c63e, 0e
sparkTesting - DiskBlockManager.scala:77 = /tmp/blockmgr-1cdf3592-ecff-4aed-aa47-61a9d098c63e/0e, broadcast_0
18/12/26 17:36:59 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.1 KB, free 366.3 MB)
sparkTesting - DiskBlockManager.scala:68 = /tmp/blockmgr-1cdf3592-ecff-4aed-aa47-61a9d098c63e, 11
sparkTesting - DiskBlockManager.scala:77 = /tmp/blockmgr-1cdf3592-ecff-4aed-aa47-61a9d098c63e/11, broadcast_0_piece0
18/12/26 17:36:59 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on hec-16:40816 (size: 2.1 KB, free: 366.3 MB)
18/12/26 17:36:59 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1039
18/12/26 17:36:59 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at flatMap at read.scala:253) (first 15 tasks are for partitions Vector(0, 1))
18/12/26 17:36:59 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
18/12/26 17:37:01 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.1.15:50878) with ID 0
18/12/26 17:37:01 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.1.15, executor 0, partition 0, PROCESS_LOCAL, 7857 bytes)
18/12/26 17:37:01 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 192.168.1.15, executor 0, partition 1, PROCESS_LOCAL, 7857 bytes)
18/12/26 17:37:01 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.15:35760 with 366.3 MB RAM, BlockManagerId(0, 192.168.1.15, 35760, None)
18/12/26 17:37:01 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.1.16:51442) with ID 1
18/12/26 17:37:01 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.16:40981 with 366.3 MB RAM, BlockManagerId(1, 192.168.1.16, 40981, None)
18/12/26 17:37:02 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.15:35760 (size: 2.1 KB, free: 366.3 MB)
18/12/26 17:37:02 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 192.168.1.15, executor 0): java.io.IOException: unexpected exception type
        at java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1682)
        at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1254)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2073)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2282)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2206)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2282)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2206)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2282)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2206)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:428)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:230)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1248)
        ... 23 more
Caused by: java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: scala/runtime/LambdaDeserialize
        at org.nersc.read$.$deserializeLambda$(read.scala)
        ... 33 more
Caused by: java.lang.NoClassDefFoundError: scala/runtime/LambdaDeserialize
        ... 34 more

18/12/26 17:37:04 INFO DAGScheduler: ResultStage 0 (count at h5reader.scala:32) failed in 5.525 s due to Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, 192.168.1.15, executor 0): java.io.IOException: unexpected exception type
        at java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1682)
        at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1254)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2073)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2282)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2206)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2282)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2206)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2282)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2206)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:428)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:230)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1248)
        ... 23 more
Caused by: java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: scala/runtime/LambdaDeserialize
        at org.nersc.read$.$deserializeLambda$(read.scala)
        ... 33 more
Caused by: java.lang.NoClassDefFoundError: scala/runtime/LambdaDeserialize
        ... 34 more

Driver stacktrace:
18/12/26 17:37:04 INFO DAGScheduler: Job 0 failed: count at h5reader.scala:32, took 5.636608 s
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in stage 0.0 (TID 6, 192.168.1.15, executor 0): java.io.IOException: unexpected exception type
        at java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1682)
        at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1254)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2073)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2282)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2206)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2282)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2206)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2282)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2206)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:428)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:230)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1248)
        ... 23 more
Caused by: java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: scala/runtime/LambdaDeserialize
        at org.nersc.read$.$deserializeLambda$(read.scala)
        ... 33 more
Caused by: java.lang.NoClassDefFoundError: scala/runtime/LambdaDeserialize
        ... 34 more

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1602)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1590)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1589)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1589)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831)
        at scala.Option.foreach(Option.scala:257)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1823)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1772)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1761)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2037)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2058)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2077)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2102)
        at org.apache.spark.rdd.RDD.count(RDD.scala:1162)
        at h5reader$.main(h5reader.scala:32)
        at h5reader.main(h5reader.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:895)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: unexpected exception type
        at java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1682)
        at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1254)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2073)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2282)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2206)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2282)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2206)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2282)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2206)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2064)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1568)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:428)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at java.lang.invoke.SerializedLambda.readResolve(SerializedLambda.java:230)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at java.io.ObjectStreamClass.invokeReadResolve(ObjectStreamClass.java:1248)
        ... 23 more
Caused by: java.lang.BootstrapMethodError: java.lang.NoClassDefFoundError: scala/runtime/LambdaDeserialize
        at org.nersc.read$.$deserializeLambda$(read.scala)
        ... 33 more
Caused by: java.lang.NoClassDefFoundError: scala/runtime/LambdaDeserialize
        ... 34 more

代码:

h5reader.scala

import hdf.hdf5lib.H5
import hdf.hdf5lib.HDF5Constants
import java.nio.file.{Paths, Files}
import java.io.{BufferedWriter, FileWriter}
import java.io.File
import math._
import org.nersc.read
import scala.io.Source
import hdf5reader.h5util

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object h5reader {
  val debug: Boolean = false

  def main(args: Array[String]): Unit = {




    println("\n\n\njava.library.path\n" + System.getProperty("java.library.path") + "\n\n\n")
    val conf = new SparkConf().setAppName("SparkGrep").setMaster("spark://hec-16:7077")
    val sc = new SparkContext(conf)





    val a = System.nanoTime()
    val rdd = read.h5read_array (sc,"/mnt/common/abuchan1/code/c/h5FileCreator/data/testData512.h5","/Compressed_Data", 2)
    val b = System.nanoTime()
    val count = rdd.count()
    val c = System.nanoTime()
    println("\n\n\nExecution time:\t" + (c - a).toString)
    println("Count time:\t" + (c - b).toString)
  }

read.scala

这是h5spark软件包;我认为部分错误来自哪里。源代码可以在这里找到: H5Spark github

环境:

项目目录

abuchan1@hec:~/code/scala/hdf5$ ls
build.sbt  h5reader.scala  h5util.scala  lib  project  read.scala  target

数据目录

abuchan1@hec:~/code/c/h5FileCreator/data$ ls
testData128.h5  testData256.h5  testData512.h5  testData64.h5

H5文件

abuchan1@hec:~/code/c/h5FileCreator/data$ h5dump -p ./testData512.h5
HDF5 "./testData512.h5" {
GROUP "/" {
   DATASET "Compressed_Data" {
      DATATYPE  H5T_STD_I32BE
      DATASPACE  SIMPLE { ( 1024, 1024, 1024 ) / ( 1024, 1024, 1024 ) }
      STORAGE_LAYOUT {
         CHUNKED ( 512, 512, 512 )
         SIZE 20206311 (212.556:1 COMPRESSION)
      }
      FILTERS {
         COMPRESSION DEFLATE { LEVEL 1 }
      }
      FILLVALUE {
         FILL_TIME H5D_FILL_TIME_IFSET
         VALUE  0
      }
      ALLOCATION_TIME {
         H5D_ALLOC_TIME_INCR
      }

0 个答案:

没有答案