尝试在Keras中标记文本时出错?

时间:2018-12-24 13:23:45

标签: python numpy keras

对于Keras和深度学习来说是新手,但是我正在遵循在线指南,因此我试图标记我的文本,以便在创建神经网络的图层时可以访问“形状”以用作“ input_shape” 。到目前为止,这是我的代码:

df = pd.read_csv(pathname, encoding = "ISO-8859-1")
df = df[['content_cleaned', 'meaningful']]
df = df.sample(frac=1)

#Transposed columns into numpy arrays 
X = np.asarray(df[['content_cleaned']])
y = np.asarray(df[['meaningful']])

#Split into training and testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=21) 

# Create tokenizer
tokenizer = Tokenizer(num_words=100) #No row has more than 100 words.

#Tokenize the predictors (text)
X_train = np.concatenate(tokenizer.sequences_to_matrix(int(X_train), mode="binary"))
X_test = np.concatenate(tokenizer.sequences_to_matrix(int(X_test), mode="binary"))

#Convert the labels to the binary
encoder = LabelBinarizer()
encoder.fit(y_train) 
y_train = encoder.transform(y_train)
y_test = encoder.transform(y_test)

该错误突出显示:

X_train = tokenizer.sequences_to_matrix(int(X_train), mode="binary")

错误消息是:

TypeError: only length-1 arrays can be converted to Python scalars

谁能抓住我的错误,并可能提供解决方案?我对此很陌生,无法解决此问题。

我希望能够调用“ X_train.shape”,以便在为网络创建图层时将其输入到input_shape中。

任何帮助都会很棒!

1 个答案:

答案 0 :(得分:0)

您正在尝试将numpy数组转换为python整数,这当然是不可能的,并且会给您带来错误(该错误与Keras无关)。您真正想要做的是将该numpy数组的dtype更改为int。请尝试以下操作:

X_train.astype(np.int32)

代替int(X_train)