视频的TFRecords

时间:2019-04-04 17:54:17

标签: python-2.7 tensorflow deep-learning tensorflow-datasets tfrecord

我正在尝试从自定义视频数据集创建TFRecords,但在完全了解如何设置它们方面遇到问题。

为了准备要存储的数据,我编写了一个脚本,该脚本对于给定的视频提要,输出形状为[N_FRAMES, WIDTH, HEIGHT, CHANNEL]的3D立方体。此后,我创建如下的tfrecord:

def _int64_feature(self, value):
  return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))

def _bytes_feature(self, value):
  return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

def createDataRecord(self, file_name, locations, categories):
    writer = tf.python_io.TFRecordWriter(file_name)

    feature = {}

    for loc, category in zip(locations, categories):
        data = self.3DVideo(loc) # the final array of shape [N_FRAMES, WIDTH, HEIGHT, CHANNEL]

        feature['height'] = self._int64_feature(self.height)
        feature['width'] = self._int64_feature(self.width)
        feature['depth'] = self._int64_feature(self.depth)
        feature['data'] = self._bytes_feature(data.tostring())
        feature['category'] = self._int64_feature(category)

        example = tf.train.Example(features=tf.train.Features(feature=feature))
        writer.write(example.SerializeToString())

    writer.close()

然后我当前的解析器功能如下

def readDataRecord(self, record):
  filename_queue = tf.train.string_input_producer([record], num_epochs=1)

  reader = tf.TFRecordReader()
  _, serialized_example = reader.read(filename_queue)

  feature =
  {'height': tf.FixedLenFeature([], tf.int64),
    'width': tf.FixedLenFeature([], tf.int64),
    'depth': tf.FixedLenFeature([], tf.int64),
    'data': tf.FixedLenFeature([], tf.string),
    'category': tf.FixedLenFeature([], tf.int64),
  }

  example = tf.parse_single_example(serialized_example, features=feature)

  video3D_buffer = tf.reshape(example['data'], shape=[])
  video3D = tf.decode_raw(video3D_buffer, tf.uint8)

  label = tf.cast(example['category'], tf.int32)

  return video3D, label

话虽如此,我的问题是:

  1. 我知道readDataRecord()是错误的,因为它适用于单个框架。如何准确地返回形状为[N_FRAMES, WIDTH, HEIGHT, CHANNEL]的单个3D立方体及其各自的类别?

  2. 简单地保存整个3D立方体甚至是个好主意吗?

任何帮助或指导将不胜感激:)

PS: 我已经研究了其他方法,包括video2tfrecord,但大多数方法似乎是为每个视频保存单独的帧,我不希望这样做。

3 个答案:

答案 0 :(得分:1)

接受的答案的缺点是您必须将数组的维度(NUM_FRAMES,WIDTH,HEIGHT,CHANNEL)存储在某处。解决方法是使用tf.io.serialize_tensor(array.astype(...))序列化整个3D多维数据集,将其作为字节字符串功能保存到TFRecord,然后(在加载TFRecord之后)使用tf.io.parse_tensor(bytestring_array_feature, out_type=...)恢复它。在这里看到一个很好的解释:https://stackoverflow.com/a/60283571(向下滚动到有关_bytes_feature的段落)

答案 1 :(得分:0)

这就是我最终要做的,而不必编码单个帧。

我最终弄平了多维数据集,然后将其写出,如下所示:

def _cube_feature(self, value):
    return tf.train.Feature(float_list=tf.train.FloatList(value=value))

def createDataRecord(self, name, locations, categories):

    writer = tf.python_io.TFRecordWriter(name)

    feature = {}

    for loc, category in zip(locations, categories):
        data = self.3DVideo(loc)
        .............
        feature['data'] = self._cube_feature(data.flatten())
        feature['category'] = self._int64_feature(category)

        example = tf.train.Example(features=tf.train.Features(feature=feature))
        writer.write(example.SerializeToString())

    writer.close()

生成的解析器为:

def readDataRecord(self, record):
    ..........
    feature = \
    {'height': tf.FixedLenFeature([], tf.int64),
     'width': tf.FixedLenFeature([], tf.int64),
     'depth': tf.FixedLenFeature([], tf.int64),
     'data': tf.FixedLenFeature((NUM_FRAMES, WIDTH, HEIGHT, CHANNEL), tf.float32),
     'category': tf.FixedLenFeature([], tf.int64),
    }

    example = tf.parse_single_example(serialized_example, features=feature)

    cube = tf.cast(example['data'], tf.uint8)
    label = tf.cast(example['category'], tf.int32)

    return cube, label

答案 2 :(得分:0)

接受的答案的另一个缺点是,由于您没有利用压缩技术(MB的视频数据变成GB的视频数据),因此会导致数据文件很大。

您应该做的是将视频数据存储为JPEG编码帧的列表(博客文章+有关如何完成操作的代码可在此处找到:https://gebob19.github.io/tfrecords/