"无法将ndarray转换为Tensor或Operation"在张量流中使用pandas数据帧

时间:2018-02-19 06:14:12

标签: pandas dataframe tensorflow machine-learning knn

我正在学习做一些基本的Tensorflow,但我遇到了一些问题。我正在尝试使用Pandas从文件加载数据,然后在数据集上执行K最近邻居,但是,我一直遇到问题

似乎Tensorflow与numpy的ndarray不兼容,我现在被困在这里两天了。我想知道将数据从CSV文件加载到Tensorflow的最佳方法是什么?

  

TypeError:获取参数数组([ - 70.837845,-62.241467,-37.82856,   -55.596767],dtype = float32)具有无效类型,必须是字符串或Tensor。 (无法将ndarray转换为Tensor   或操作。)

import numpy as npfrom sklearn import preprocessing
from sklearn.model_selection import train_test_split
import pandas as pd
import tensorflow as tf
import time

Neighbors = 4
Training_step=1

data_frame=pd.read_csv('./data/breast-cancer-
wisconsin.csv',encoding='gbk')
# replace missing data with outlier inplace
data_frame.replace('?',-99999,inplace=True)
Y=np.array(data_frame['class'])

data_frame.drop(['id'],1,inplace=True)
X=np.array(data_frame.drop(['class'],1))

# splits dataset for cross validation x_train,x_test,y_train,y_test=train_test_split(X,Y,test_size=0.3,random_state=0)
y_train.shape=(489,1)

# tf Graph Input
x_training = tf.placeholder("float",[None,9],name="x_training_ph")
y_training = tf.placeholder("float",[None,1],name="y_training_ph")
x_testing = tf.placeholder("float",[9],name="x_testing_ph")

eucli_distance =tf.negative(tf.sqrt(tf.reduce_sum(tf.square(tf.subtract((x_training),         (x_testing))), axis=0)))

values, indices = tf.nn.top_k(eucli_distance, k=Neighbors, sorted=False)

nearest_neighbors = []
for i in range(Neighbors):
    #Returns the index with the largest value across axes of a tensor.
    nearest_neighbors.append(tf.argmax(y_training[indices[i]], 0))

#stack the tensor together
neighbors_tensor = tf.stack(nearest_neighbors)

#returns a tensor y containing all of the unique elements of x sorted         in the same order that they occur in x.
# This operation also returns a tensor idx the same size as x that contains the index of each value of x in the unique output y
y, idx, count = tf.unique_with_counts(neighbors_tensor)

#This operation extracts a slice of size size from a tensor input     starting at the location specified by begin.
#Get the closest neightbor
pred = tf.slice(y, begin=[tf.argmax(count, 0)], size=tf.constant([1], dtype=tf.int64))[0]

accuracy = 0.

# Initializing the variables
init = tf.global_variables_initializer()

start_time=time.time()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)

# loop over test data
for i in range(len(x_test)):
    # Get nearest neighbor
    # feed to place holder

    nn_index = sess.run(pred, feed_dict={x_training: x_train,     y_training : y_train, x_testing: x_test[i, :]})
    distance = sess.run(eucli_distance, feed_dict={x_training: x_train, y_training : y_train, x_testing: x_test[i, :]})
    print("Distnace is ", len(distance), " ", distance)
    values = sess.run(values, feed_dict={x_training: x_train, y_training : y_train, x_testing: x_test[i, :]})
    print("Value is ", len(values), " ", values)
    print("Case:", i, "Prediction:", nn_index,
         "True label", np.argmax(y_test[i]))
    #Calculate accuracy
    if nn_index == np.argmax(y_test[i]):
        accuracy += 1. / len(x_test)
    else:
        print("Not matched")
print("==========================================")
print('Neighbors:',Neighbors)
print('Training step:',Training_step)
print("Time used: %s second" % (time.time() - start_time))
print("Accuracy:", accuracy)

我正在使用的数据集来自UCI,数据集如下:

id,clump_thickness,unif_cell_size,unif_cell_shape,marg_adhesion,single_epith_cell_size,bare_nuclei,bland_chromatin,normal_nucleoli,mitoses,class
1000025,5,1,1,1,2,1,3,1,1,2
1002945,5,4,4,5,7,10,3,2,1,2
1015425,3,1,1,1,2,2,3,1,1,2
1016277,6,8,8,1,3,4,3,7,1,2
1017023,4,1,1,3,2,1,3,1,1,2
1017122,8,10,10,8,7,10,9,7,1,4
1018099,1,1,1,1,2,10,3,1,1,2

1 个答案:

答案 0 :(得分:0)

您重新定义values,原来是top_k张量:

values, indices = tf.nn.top_k(eucli_distance, k=Neighbors, sorted=False)

...然后是评估结果,即np.ndarray

values = sess.run(values, feed_dict={...})

因此,在第二次循环迭代中,tensorflow无法确定sess.run(values)代表什么。只需选择一个不同的变量名称。