Question

So I have downloaded the MNIST digit dataset from a Kaggle competition and I want to edit the 'train.csv' file so that I can train my neural network. The 'train.csv' file has 42000 rows and 785 columns. Each row represents an image. The first column contains the label i.e the number in the image and the rest of the columns are the pixel values of the 28X28 image of the digit.

I want to be able to store the first column as a 'training_result' vector and the rest as 'training_inputs' matrix.

So first I load the csv file using pandas.

data = read_csv("train.csv")

Now to create the training_result vector, i tried this:

 training_result = data[0:42001][0:1]
 >>training_result.shape
 (1,785)

So i am getting one row x 785 columns instead of 42000 rows x one column. Is there a mistake in the slicing operation?

Also for getting training_inputs, I tried

training_inputs = data[0:42001][1:785]
>>training_inputs.shape
(784,785)

I get 784 rows x 785 columns instead of 42000 rows x 784 columns.

How can I rectify this mistake?

Answer 1

training_result=data[0].values
training_inputs=data[1:].values

Answer 2

I would first check the shape of data to see if it is correct. If it is good, then I think you should interchange the way your are indexing.

training_result = data[0:1][0:42001]

and

training_inputs = data[0:784][0:42001]

Answer 3

我终于理解了我正在做的错误。基本上，我试图进行多数组切片，但使用的语法是错误的。

因此，如果我希望我的training_inputs变量从数据集'data'中获得42000行和784列，我应该执行以下操作

training_result = data[0:42001,1:785]

Coverting csv into numpy array

3 个答案: