如何在Java中逐步增加无限制的HDF5数据集

时间:2015-07-25 15:51:24

标签: java hdf5

我的基本用例非常简单。我不知道数据集会有多大。我在循环中运行并且必须存储数据。为此,我尝试创建一个我希望像这样使用的实用程序类:

    H5File file = new H5File( Paths.get( file ) );

    file.create();
    file.createDataset( dataset ); // empty

    file.extendDataset( dataset, data ); // append some data
    file.extendDataset( dataset, data ); // append some data
    // ... do this some more

    file.close();

我的问题在于extendDataset成员函数的实现。我了解为此,我必须使用hyperslab创建H5.H5Sselect_hyperslab,然后使用H5.H5Dwrite写入新选择。我的代码的相应部分看起来像这样。

boolean extendDataset( String param, float[] data )
{
    try
    {
        long[] extdims = new long[1];

        H5ScalarDS dataset = (H5ScalarDS) h5File.get( param );

        int dataset_id = dataset.open();
        int dataspace_id = H5.H5Dget_space( dataset_id );

        H5.H5Sget_simple_extent_dims( dataspace_id, extdims, null );

        long[] start = extdims.clone();
        extdims[0] += data.length;

        H5.H5Sclose(dataspace_id);

        dataset.extend( extdims );

        dataspace_id = H5.H5Dget_space(dataset_id);

        long[] count = { data.length };
        start[ 0 ] = extdims[0] - data.length;

        H5.H5Sselect_hyperslab(
                dataspace_id,
                HDF5Constants.H5S_SELECT_SET,
                start, null,
                count, null );

        float [] extData = new float[ (int)extdims[0] ];
        System.arraycopy( data, 0, extData, (int)start[0], data.length );

        // Write the data to the selected portion of the dataset.
        H5.H5Dwrite(
                dataset_id,
                HDF5Constants.H5T_NATIVE_FLOAT,
                HDF5Constants.H5S_ALL,
                dataspace_id,
                HDF5Constants.H5P_DEFAULT,
                extData );

        dataset.close( dataset_id );
    }
    catch( Exception e )
    {
        e.printStackTrace();
        return false;
    }

    return true;
}

我对此代码的问题是行。

        float [] extData = new float[ (int)extdims[0] ];
        System.arraycopy( data, 0, extData, (int)start[0], data.length );

如果不使缓冲区extData与数据集一样大,我没有得到这个工作,这是我想要避免的。 有没有办法处理传递给H5.H5Dwrite的数据,这只是hyperslab的大小。

这是完整的源代码,应该非常容易运行。

  package h5;

  import ncsa.hdf.hdf5lib.H5;
  import ncsa.hdf.hdf5lib.HDF5Constants;
  import ncsa.hdf.hdf5lib.exceptions.HDF5Exception;
  import ncsa.hdf.object.Datatype;
  import ncsa.hdf.object.FileFormat;
  import ncsa.hdf.object.Group;
  import ncsa.hdf.object.h5.H5Datatype;
  import ncsa.hdf.object.h5.H5ScalarDS;

  import java.nio.file.Path;
  import java.nio.file.Paths;

  /**
   * Created by Thomas on 24.07.2015.
   */
  public class H5File
  {

      @FunctionalInterface
      public interface IFloatGenerator
      {
          float generate( int t );
      }

      public static
      float[]
      generate( float [] vs, IFloatGenerator gen )
      {
          for( int i = 0; i < vs.length; ++i  )
          {
              vs[i] = gen.generate( i );
          }

          return vs;
      }


      Path path;
      ncsa.hdf.object.h5.H5File h5File;
      final H5Datatype floatType = new H5Datatype(Datatype.CLASS_FLOAT, 4, Datatype.NATIVE, -1);

      private static final long[] dims = { 0 };
      private static final long[] maxdims = { HDF5Constants.H5S_UNLIMITED };
      private static final long[] chunks = { 16384 };

      public H5File( Path path )
      {
          this.path = path;
      }

      boolean open()
      {
          h5File = new ncsa.hdf.object.h5.H5File( path.toString(), FileFormat.WRITE );
          try
          {
              h5File.open();
          }
          catch( Exception e )
          {
              e.printStackTrace();
              return  false;
          }

          return true;
      }

      boolean create()
      {
          h5File = new ncsa.hdf.object.h5.H5File( path.toString(), FileFormat.CREATE);
          try
          {
              h5File.open();
          }
          catch( Exception e )
          {
              e.printStackTrace();
              return false;
          }

          return true;
      }

      boolean close()
      {
          try
          {
              h5File.close();
          }
          catch( HDF5Exception e )
          {
              e.printStackTrace();
              return false;
          }

          return true;
      }

      boolean createDataset( String name )
      {
          Group root = (Group) ((javax.swing.tree.DefaultMutableTreeNode) h5File.getRootNode()).getUserObject();

          try
          {
              h5File.createScalarDS( name, root, floatType, dims, maxdims, chunks, 0, null);
          }
          catch( Exception e )
          {
              e.printStackTrace();
              return false;
          }

          return true;
      }

      boolean extendDataset( String param, float[] data )
      {
          try
          {
              long[] extdims = new long[1];

              H5ScalarDS dataset = (H5ScalarDS) h5File.get( param );

              int dataset_id = dataset.open();
              int dataspace_id = H5.H5Dget_space( dataset_id );

              H5.H5Sget_simple_extent_dims( dataspace_id, extdims, null );

              long[] start = extdims.clone();
              extdims[0] += data.length;

              H5.H5Sclose(dataspace_id);

              dataset.extend( extdims );

              dataspace_id = H5.H5Dget_space(dataset_id);

              long[] count = { data.length };
              start[ 0 ] = extdims[0] - data.length;

              H5.H5Sselect_hyperslab(
                      dataspace_id,
                      HDF5Constants.H5S_SELECT_SET,
                      start, null,
                      count, null );

              float [] extData = new float[ (int)extdims[0] ];
              System.arraycopy( data, 0, extData, (int)start[0], data.length );

              // Write the data to the selected portion of the dataset.
              H5.H5Dwrite(
                      dataset_id,
                      HDF5Constants.H5T_NATIVE_FLOAT,
                      HDF5Constants.H5S_ALL,
                      dataspace_id,
                      HDF5Constants.H5P_DEFAULT,
                      extData );

              dataset.close( dataset_id );
          }
          catch( Exception e )
          {
              e.printStackTrace();
              return false;
          }

          return true;
      }



      public static void main(String[] argv)
      {
          H5File file = new H5File( Paths.get( "test.h5" ) );

          String name = "floats";
          float[] data = new float[ 10 ];

          file.create();
          file.createDataset( name );
          generate( data, i -> i );
          file.extendDataset( name, data );
          generate( data, i -> 10 + i );
          file.extendDataset( name, data );
          file.close();
      }


  }

1 个答案:

答案 0 :(得分:1)

  

有没有办法处理传递给H5.H5Dwrite的数据,这只是hyperslab的大小。

是。传递到H5Dwrite的数据必须与数据集匹配的原因是参数H5S_ALL。而是必须使用data的数据集ID。这可以使用H5Screate_simple

创建
 long[] mem_dim = { data.length };
 int mem_dataset_id = H5.H5Screate_simple( 1, mem_dim, mem_dim );

 H5.H5Dwrite(
          dataset_id,
          HDF5Constants.H5T_NATIVE_FLOAT,
          mem_dataset_id, 
          dataspace_id,
          HDF5Constants.H5P_DEFAULT,
          data );

 H5.H5Sclose( mem_dataset_id );