ValueError:标签形状必须是[batch_size,labels_dimension],got(128,2)

时间:2017-08-31 18:29:06

标签: python pandas numpy tensorflow

在Python 3.5.2中使用TensorFlow 1.3.0版。我试图在TensorFlow网站上的Iris数据教程中模仿DNNClassifier的功能,并遇到了困难。我导入了一个包含大约155行数据和15列的CSV文件,将数据分解为训练和测试数据(我尝试对正向或负向运动进行分类),并在我开始训练时收到错误我的分类器。以下是数据的设置方式

    #import values from csv
    mexicof1 = pd.read_csv('Source/mexicoR.csv')

    #construct pandas dataframe
    mexico_df = pd.DataFrame(mexicof1)
    #start counting from mexico.mat.2.nrow.mexico.mat...1.
    mexico_dff = pd.DataFrame(mexico_df.iloc[:,1:16])
    mexico_dff.columns = ['tp1_delta','PC1','PC2','PC3','PC4','PC5','PC6','PC7', \
                  'PC8', 'PC9', 'PC10', 'PC11', 'PC12', 'PC13', 'PC14']


    #binary assignment for positive/negative values
    for i in range(0,155):
        if(mexico_dff.iloc[i,0] > 0):
            mexico_dff.iloc[i,0] = "pos"
        else:
            mexico_dff.iloc[i,0] = "neg"

    #up movement vs. down movement classification set up
    up = np.asarray([1,0])
    down = np.asarray([0,1])
    mexico_dff['tp1_delta'] = mexico_dff['tp1_delta'].map({"pos": up, "neg": down})


    #Break into training and test data
    #data: independent values
    #labels: classification
    mexico_train_DNN1data = mexico_dff.iloc[0:150, 1:15]
    mexico_train_DNN1labels = mexico_dff.iloc[0:150, 0]
    mexico_test_DNN1data = mexico_dff.iloc[150:156, 1:15]
    mexico_test_DNN1labels = mexico_dff.iloc[150:156, 0]

    #Construct numpy arrays for test data
    temptrain = []
    for i in range(0, len(mexico_train_DNN1labels)):
        temptrain.append(mexico_train_DNN1labels.iloc[i])
    temptrainFIN = np.array(temptrain, dtype = np.float32)

    temptest = []
    for i in range(0, len(mexico_test_DNN1labels)):
        temptest.append(mexico_test_DNN1labels.iloc[i])
    temptestFIN = np.array(temptest, dtype = np.float32)

    #set up NumPy arrays
    mTrainDat = np.array(mexico_train_DNN1data, dtype = np.float32)
    mTrainLab = temptrainFIN
    mTestDat = np.array(mexico_test_DNN1data, dtype = np.float32)
    mTestLab = temptestFIN

这样做可以获得如下所示的数据:

    #Independent value output
    mTestDat
    Out[289]: 
    array([[-0.08404002, -3.07483053,  0.41106853, ..., -0.08682428,
     0.32954004, -0.36451185],
   [-0.31538665, -2.23493481,  1.97653472, ...,  0.35220796,
     0.09061374, -0.59035355],
   [ 0.44257978, -3.04786181, -0.6633662 , ...,  1.34870672,
     0.43879321,  0.26306254],
   ..., 
   [ 2.38574553,  0.09045095, -0.09710167, ...,  1.20889878,
     0.00937434, -0.06398607],
   [ 1.68626559,  0.65349185,  0.23625408, ..., -1.16267788,
     0.45464727, -1.14916229],
   [ 1.58263958,  0.1223636 , -0.12084256, ...,  0.7947616 ,
    -0.47359121,  0.28013545]], dtype=float32)

    #Classification labels (up or down movement) output
    mTestLab
    Out[290]: 
    array([[ 0.,  1.],
   [ 0.,  1.],
   [ 0.,  1.],
   [ 1.,  0.],
   [ 0.,  1.],
   [ 1.,  0.],
    ........
   [ 1.,  0.],
   [ 0.,  1.],
   [ 0.,  1.],
   [ 0.,  1.]], dtype=float32)

按照这个给定设置的教程后,我可以在classifier.train()函数停止运行之前运行代码并给出以下错误:

    # Specify that all features have real-value data
    feature_columns = [tf.feature_column.numeric_column("x", shape=[mexico_train_DNN1data.shape[1]])]

    # Build 3 layer DNN with 10, 20, 10 units respectively.
    classifier = tf.estimator.DNNClassifier(feature_columns=feature_columns,
                                    hidden_units=[10, 20, 10],
                                    optimizer = tf.train.AdamOptimizer(0.01),
                                    n_classes=2) #representing either an up or down movement


    train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x = {"x": mTrainDat},
    y = mTrainLab,
    num_epochs = None,
    shuffle = True)

    #Now, we train the model
    classifier.train(input_fn=train_input_fn, steps = 2000)


      File "Source\Anaconda3\envs\tensorflow\lib\site-packages\tensorflow\python\estimator\canned\head.py", line 174, in _check_labels
(static_shape,))

    ValueError: labels shape must be [batch_size, labels_dimension], got (128, 2).

我不确定为什么我会遇到此错误,我们非常感谢您的帮助。

1 个答案:

答案 0 :(得分:1)

2017-06-19 21:54:11,773 mimeType=docx,baseFileName=TEST boo.docx,fileNamePrefix=7ff852cb-b1db-49d3-ba71-e151dbc1f41e,doEncrypt=true,decryptedFileSize=125589,Test foo-boo.docx [source:MessageConsumer] 2017-06-19 21:54:11,774 mimeType=docx,baseFileName=TEST foo.docx,fileNamePrefix=7ff852cb-b1db-49d3-ba71-e151dbc1f41e,doEncrypt=true,decryptedFileSize=125589,Test foo.docx [source:MessageConsumer] 需要类标签(即0或1)时,您正在使用单热([1,0]或[0,1])编码标签。在最后一个轴上解码一个热门编码,使用

SELECT bs.id, bs.basket_name, bs.high_quality, count(*) yarn_ball_count
FROM yb.basket_status bs
     JOIN yb.yarn_ball_updates ybu ON bs.basket_name = ybu.alpha_pmn
     JOIN (SELECT MAX(ybu.report_date) mxdate FROM yb.yarn_ball_updates) mxd ON ybu.report_date = mxd.mxdate
GROUP BY bs.basket_name, bs.high_quality

注意二进制文件可能更快做

DNNClassifier

虽然性能差异不会很大,但我可能会使用更通用的版本以防你以后添加另一个类。

在您的情况下,在生成numpy标签后,只需添加

即可
class_labels = np.argmax(one_hot_vector, axis=-1)