Question

随着AudioSet的发布，为那些为研究做出合理分析的人提供了一个全新的研究领域，我过去几天都在努力深入研究如何分析和解码这些数据。

数据以.tfrecord文件形式提供，这是一个小片段。

�^E^@^@^@^@^@^@C�bd
u
^[
^Hvideo_id^R^O

^KZZcwENgmOL0
^^
^Rstart_time_seconds^R^H^R^F
^D^@^@�C
^X
^Flabels^R^N^Z^L

�^B�^B�^B�^B�^B
^\
^Pend_time_seconds^R^H^R^F
^D^@^@�C^R�

�

^Oaudio_embedding^R�

�^A
�^A
�^A3�^] q^@�Z�r�����w���Q����.���^@�b�{m�^@P^@^S����,^]�x�����:^@����^@^@^Z0��^@]^Gr?v(^@^U^@��^EZ6�$
�^A

给出的示例原型是：

context: {
  feature: {
    key  : "video_id"
    value: {
      bytes_list: {
        value: [YouTube video id string]
      }
    }
  }
  feature: {
    key  : "start_time_seconds"
    value: {
      float_list: {
        value: 6.0
      }
    }
  }
  feature: {
    key  : "end_time_seconds"
    value: {
      float_list: {
        value: 16.0
      }
    }
  }
  feature: {
    key  : "labels"
      value: {
        int64_list: {
          value: [1, 522, 11, 172] # The meaning of the labels can be found here.
        }
      }
    }
}
feature_lists: {
  feature_list: {
    key  : "audio_embedding"
    value: {
      feature: {
        bytes_list: {
          value: [128 8bit quantized features]
        }
      }
      feature: {
        bytes_list: {
          value: [128 8bit quantized features]
        }
      }
    }
    ... # Repeated for every second of the segment
  }

}

我在这里直截了当的问题 - 我似乎无法找到好的信息 - 我如何在两者之间干净利落地转换？

如果我有一个机器可读的文件，如何使它具有人类可读性，以及相反的方式。

我发现this采用了tfrecord图片并将其转换为可读格式......但我似乎无法将其转换为适用于AudioSet的表单

Answer 1

这对我有用，将功能存储在feat_audio中。绘制它们，将它们转换为ndarray并相应地重塑它们。

audio_record = '/audioset_v1_embeddings/eval/_1.tfrecord'
vid_ids = []
labels = []
start_time_seconds = [] # in secondes
end_time_seconds = []
feat_audio = []
count = 0
for example in tf.python_io.tf_record_iterator(audio_record):
    tf_example = tf.train.Example.FromString(example)
    #print(tf_example)
    vid_ids.append(tf_example.features.feature['video_id'].bytes_list.value[0].decode(encoding='UTF-8'))
    labels.append(tf_example.features.feature['labels'].int64_list.value)
    start_time_seconds.append(tf_example.features.feature['start_time_seconds'].float_list.value)
    end_time_seconds.append(tf_example.features.feature['end_time_seconds'].float_list.value)

    tf_seq_example = tf.train.SequenceExample.FromString(example)
    n_frames = len(tf_seq_example.feature_lists.feature_list['audio_embedding'].feature)

    sess = tf.InteractiveSession()
    rgb_frame = []
    audio_frame = []
    # iterate through frames
    for i in range(n_frames):
        audio_frame.append(tf.cast(tf.decode_raw(
                tf_seq_example.feature_lists.feature_list['audio_embedding'].feature[i].bytes_list.value[0],tf.uint8)
                       ,tf.float32).eval())

    sess.close()
    feat_audio.append([])

    feat_audio[count].append(audio_frame)
    count+=1

Answer 2

AudioSet数据不是tensorflow.Example protobuf，就像您链接的图像示例一样。这是一个 SequenceExample 。

我没有经过测试，但如果您将tf.parse_single_example替换为tf.parse_single_sequence_example（并替换字段名称），则应该能够使用您链接的代码。

Answer 3

这是我到目前为止所做的。 prepare_serialized_examples来自youtube-8m starter code。我希望有帮助:)

import tensorflow as tf

feature_names = 'audio_embedding'

def prepare_serialized_examples(serialized_example,max_quantized_value=2, min_quantized_value=-2):

contexts, features = tf.parse_single_sequence_example(
        serialized_example,
        context_features={"video_id": tf.FixedLenFeature([], tf.string),
                          "labels": tf.VarLenFeature(tf.int64)},
        sequence_features={'audio_embedding' : tf.FixedLenSequenceFeature([10], dtype=tf.string)
    })

decoded_features = tf.reshape(
    tf.cast(tf.decode_raw(features['audio_embedding'], tf.uint8), tf.float32),
    [-1, 128])

return contexts, features


filename = '/audioset_v1_embeddings/bal_train/2a.tfrecord'
filename_queue = tf.train.string_input_producer([filename], num_epochs=1)

reader = tf.TFRecordReader()

with tf.Session() as sess:

    _, serialized_example = reader.read(filename_queue)
    context, features = prepare_serialized_examples_(serialized_example)

    init_op = tf.initialize_all_variables()
    sess.run(init_op)

    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)

    print(sess.run(features))

    coord.request_stop()
    coord.join(threads)

Answer 4

YouTube-8M starter code应该可以使用开箱即用的AudioSet tfrecord文件。

AudioSet和Tensorflow理解

4 个答案: