RNN中的主成分分析

时间:2017-04-26 12:28:38

标签: python tensorflow pca

如果我想将序列(特征)A,B和C投影到具有张量流LSTM的目标序列,我如何知道影响目标的每个特征的重要性?主成分分析是否有帮助?如果pca有帮助,怎么办?

数据集的结构(列),如下所示:

import speech_recognition as sr

r = sr.Recognizer()
framerate = 100
with sr.AudioFile("1.wav") as source:

    audio = r.record(source)

    decoder = r.recognize_sphinx(audio, show_all=False)

    print ([(seg.word, seg.start_frame/framerate)for seg in decoder.seg()])

1 个答案:

答案 0 :(得分:0)

这个序列的主要组成部分是什么?您可以做的是采用A序列,B序列和C序列的PCA并可视化...... 以下是使用Tensorboard可视化PCA的简单教程:http://www.pinchofintelligence.com/simple-introduction-to-tensorboard-embedding-visualisation/

%matplotlib inline
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np
import os

from tensorflow.contrib.tensorboard.plugins import projector
from tensorflow.examples.tutorials.mnist import input_data

LOG_DIR = 'minimalsample'
NAME_TO_VISUALISE_VARIABLE = "mnistembedding"
TO_EMBED_COUNT = 500


path_for_mnist_sprites =  os.path.join(LOG_DIR,'mnistdigits.png')
path_for_mnist_metadata =  os.path.join(LOG_DIR,'metadata.tsv')
mnist = input_data.read_data_sets("MNIST_data/", one_hot=False)
batch_xs, batch_ys = mnist.train.next_batch(TO_EMBED_COUNT)
embedding_var = tf.Variable(batch_xs, name=NAME_TO_VISUALISE_VARIABLE)
summary_writer = tf.summary.FileWriter(LOG_DIR)
config = projector.ProjectorConfig()
embedding = config.embeddings.add()
embedding.tensor_name = embedding_var.name

# Specify where you find the metadata
embedding.metadata_path = path_for_mnist_metadata #'metadata.tsv'

# Specify where you find the sprite (we will create this later)
embedding.sprite.image_path = path_for_mnist_sprites #'mnistdigits.png'
embedding.sprite.single_image_dim.extend([28,28])

# Say that you want to visualise the embeddings
projector.visualize_embeddings(summary_writer, config)
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
saver = tf.train.Saver()
saver.save(sess, os.path.join(LOG_DIR, "model.ckpt"), 1)
with open(path_for_mnist_metadata,'w') as f:
    f.write("Index\tLabel\n")
    for index,label in enumerate(batch_ys):
        f.write("%d\t%d\n" % (index,label))

希望这有助于您考虑PCA!