Question

我正在尝试将MNIST数据用于我的研究工作。现在数据集描述是：

training_data作为具有两个条目的元组返回。第一个条目包含实际的训练图像。这是一个 numpy ndarray有50,000个参赛作品。每个条目依次是a numpy ndarray有784个值，代表28 * 28 = 784 单个MNIST图像中的像素。
The second entry in the ``training_data`` tuple is a numpy ndarray
containing 50,000 entries.  Those entries are just the digit
values (0...9) for the corresponding images contained in the first
entry of the tuple.

现在我正在转换训练数据：

特别是，training_data是一个包含50,000的列表 2元组(x, y)。 x是一个784维的numpy.ndarray 包含输入图像。 y是10维的 numpy.ndarray表示对应的单位向量 x的正确数字。并且代码是：

def load_data_nn():
    training_data, validation_data, test_data = load_data()
    #print training_data[0][1]
    #inputs = [np.reshape(x, (784, 1)) for x in training_data[0]]
    inputs = [np.reshape(x, (784,1)) for x in training_data[0]]
    print inputs[0]
    results = [vectorized_result(y) for y in training_data[1]]
    training_data = zip(inputs, results)
    test_inputs = [np.reshape(x, (784, 1)) for x in test_data[0]]
    return (training_data, test_inputs, test_data[1])

现在我想将输入写入文本文件，这意味着一行将是输入[0]，另一行将是输入[1]，输入[0]内的数据将以空格分隔，并且没有ndarray括号现在。例如：

 0 0.45 0.47 0,76

 0.78 0.34 0.35 0.56

这里文本文件中的一行是输入[0]。如何在文件文件中将ndarray转换为如上所述？

Answer 1

由于你的问题的答案似乎很容易，我猜你的问题就是速度。幸运的是，我们可以在这里使用多处理。试试这个：

from multiprocessing import Pool

def joinRow(row):
    return ' '.join(str(cell) for cell in row)

def inputsToFile(inputs, filepath):
    # in python3 you can do:
    # with Pool() as p:
    #     lines = p.map(joinRow, inputs, chunksize=1000)
    # instead of code from here...
    p = Pool()
    try:
        lines = p.map(joinRow, inputs, chunksize=1000)
    finally:
        p.close()
    # ...to here. But this works for both.

    with open(filepath,'w') as f:
        f.write('\n'.join(lines)) # joining already created strings goes fast

在我糟糕的笔记本电脑上还需要一段时间，但比'\n'.join(' '.join(str(cell) for cell in row) for row in inputs)

快得多

顺便说一下，您也可以加快代码的其余部分：

def load_data_nn():
    training_data, validation_data, test_data = load_data()
    # suppose training_data[0].shape == (50000,28,28), otherwise leave it as is
    inputs = training_data[0].reshape((50000,784,1))
    print inputs[0]
    # create identity matrix and use entries of training_data[1] to
    # index corresponding unit vectors
    results = np.eye(10)[training_data[1]]
    training_data = zip(inputs, results)
    # suppose test_data[0].shape == (50000,28,28), otherwise leave it as is
    test_inputs = test_data[0].reshape((50000,784,1))
    return (training_data, test_inputs, test_data[1])

如何在python中将ndarray写入文本文件？

1 个答案: