Question

我正在尝试使用CNN网络的字符串数据，这需要我将m个字符串的数组转换为m * 200 * 36 * 1矩阵，后来由CNN使用。

这段代码对我有用，但我希望这个数据准备可以成为张量流图的一部分。我目前正在使用keras来消耗它，但我认为keras是建立在tensorflow之上的，所以也许keras也会消耗tensorflow图？我也在考虑将来使用tensorflow而不是keras;我刚刚开始使用keras，因为它似乎更简单。

如果可能的话，我希望将其合并到实际的NN中，以便：

我的内存耗尽

我可以玩迷你批次和洗牌参数，同时只转换特定小批量所需的数据

import string
import numpy as np

def string_to_bit(text):

    char_values = list('0123456789' + string.ascii_lowercase)


    d = len(char_values)
    a = np.zeros((200,d))
    j = 0
    c = True
    for i in text.lower():

        try: 

            a[j][char_values.index(i)] = 1
            j = j + 1

            c=True

        except:

            if c:

                j = j + 1

            c = False

    return a

def string_to_bit_array(array, d = 36):

    out = np.zeros((len(array),200,d))
    for i in range(len(array)):
        out[i] = string_to_bit(array[i])

    return out

data = np.asarray(['Hello World', 'abc 123'])

m, l, d = len(data), 200, 36

data_transformed = np.float32(string_to_bit_array(data).reshape(m, l, d, 1)).astype('float32')

在这种情况下，m = 2，输出应为2 * 200 * 36 * 1。这就是argmax应该是：

    from numpy import argmax
    print(argmax(data_transformed[:,:11], axis =2))

    [[[17]
      [14]
      [21]
      [21]
      [24]
      [ 0]
      [32]
      [24]
      [27]
      [21]
      [13]]

     [[10]
      [11]
      [12]
      [ 0]
      [ 1]
      [ 2]
      [ 3]
      [ 0]
      [ 0]
      [ 0]
      [ 0]]]

我的NN的第一层将有一个input_shape =（200,36,1）

    model = Sequential()
    model.add(Conv2D(36, (3, 36), strides=(1,1), activation='tanh', padding='valid', input_shape=(200,36,1)))

我不确定是否所有这些都要求太多，但请注意输入形状为（200,36,1），根据批量大小调整m，这将是稍后在keras拟合中输入的参数。 / p>

    model.fit(data_transformed , target_label, epochs=10, batch_size=1, verbose=1)

我实际上并没有使用tensorflow，但我想这将在sess（）中有批量大小的输入参数。

Tensorflow String to Bit表示

0 个答案: