Tensorflow:使用神经网络对正面或负面短语进行分类

时间:2017-03-22 11:58:16

标签: python machine-learning tensorflow neural-network

我在这里通过教程: https://pythonprogramming.net/train-test-tensorflow-deep-learning-tutorial/

我可以训练神经网络并打印出准确性。

但是,我不知道如何使用神经网络进行预测。

这是我的尝试。具体问题是这一行 - 我相信我的问题是我无法将我的输入字符串转换为模型所期望的格式:

features = get_features_for_input("This was the best store i've ever seen.")
result = (sess.run(tf.argmax(prediction.eval(feed_dict={x:features}),1)))

这是一个更大的列表:

def train_neural_network(x):
    prediction = neural_network_model(x)
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=prediction, labels=y)) 
    optimizer = tf.train.AdamOptimizer().minimize(cost)

    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())

        for epoch in range(hm_epochs):
            epoch_loss = 0
            i = 0
            while i < len(train_x):
                start = i
                end = i + batch_size

                batch_x = np.array(train_x[start:end])
                batch_y = np.array(train_y[start:end])

                _, c = sess.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y})

                epoch_loss += c 
                i+=batch_size

            print('Epoch', epoch, 'completed out of', hm_epochs, 'loss:', epoch_loss)

        correct = tf.equal(tf.argmax(prediction, 1), tf.argmax(y,1))        
        accuracy = tf.reduce_mean(tf.cast(correct,'float'))
        print('Accuracy', accuracy.eval({x:test_x, y:test_y}))

        # pos: [1,0] , argmax: 0
        # neg: [0,1] , argmax: 1
        features = get_features_for_input("This was the best store i've ever seen.")
        result = (sess.run(tf.argmax(prediction.eval(feed_dict={x:features}),1)))
        if result[0] == 0:
            print('Positive:',input_data)
        elif result[0] == 1:
            print('Negative:',input_data)

def get_features_for_input(input):
    current_words = word_tokenize(input.lower())
    current_words = [lemmatizer.lemmatize(i) for i in current_words]
    features = np.zeros(len(lexicon))

    for word in current_words:
        if word.lower() in lexicon:
            index_value = lexicon.index(word.lower())
            # OR DO +=1, test both
            features[index_value] += 1

    features = np.array(list(features))

train_neural_network(x)

3 个答案:

答案 0 :(得分:9)

根据您的上述评论,感觉您的错误ValueError: Cannot feed value of shape ()是由于featuresNone,因为您的函数get_features_for_input没有返回任何东西。

我添加了return features行,并为功能提供了正确的[1, len(lexicon)]形状,以匹配占位符的形状。

def get_features_for_input(input):
    current_words = word_tokenize(input.lower())
    current_words = [lemmatizer.lemmatize(i) for i in current_words]
    features = np.zeros((1, len(lexicon)))

    for word in current_words:
        if word.lower() in lexicon:
            index_value = lexicon.index(word.lower())
            # OR DO +=1, test both
            features[0, index_value] += 1

    return features

答案 1 :(得分:3)

您的get_features_for_input函数会返回表示句子功能的单个列表,但对于feed_dict,输入的大小必须为[num_examples, features_size],此处num_examples为{{1} }}。

以下代码应该有效。

1

答案 2 :(得分:2)

任何机器学习算法的基本原理是在训练和测试期间维度应该相同。

在训练期间,您创建了矩阵形状number of training samples, len(lexicon)。在这里,您正在尝试使用单词方法,词汇只是训练数据中的唯一单词。

在测试期间,您的输入矢量大小应与训练的矢量大小相同。它只是训练期间创建的词典大小。此外,测试向量中的每个元素都在词典中定义相应的索引词。

现在回答你的问题,在get_features_for_input(input)你使用了词典,你必须在程序的某个地方定义。鉴于错误,我得出的结论是你的词典列表为空,因此在get_features_for_input函数features = np.zeros(len(lexicon))中将生成零形状的数组,并且永远不会进入循环。

很少有预期的修改:

您可以在tutorial中找到功能create_feature_sets_and_labels。返回清理后的格式化训练数据。更改return语句以返回词典列表以及数据。

return train_x,train_y,test_x,test_y,lexicon

进行小改动以收集词典列表,参考:here

train_x,train_y,test_x,test_y,lexicon = create_feature_sets_and_labels('/path/to/pos.txt','/path/to/neg.txt')

然后将此词典列表与您的输入一起传递给get_features_for_input函数

features = get_features_for_input("This was the best store i've ever seen.",lexicon)

get_features_for_input功能

中进行小幅更改
def get_features_for_input(text,lexicon):
    featureset = []
    current_words = word_tokenize(text.lower())
    current_words = [lemmatizer.lemmatize(i) for i in current_words]
    features = np.zeros(len(lexicon))
    for word in current_words:
        if word.lower() in lexicon:
            index_value = lexicon.index(word.lower())
            features[index_value] += 1
    featureset.append(features)
    return np.asarray(featureset)