我正在尝试对堆栈溢出帖子进行一些 NLP,以根据标题中的内容预测标签。
我有一个限制,即我必须使用 Sentence transformers
的框架嵌入我的句子这个想法是嵌入句子并将它们用作我构建的神经网络的输入。
我不是神经网络方面的专家,所以我可能遗漏了很多东西
我遇到的问题是它无法转换为张量。 我试过用 this post on SO 解决这个问题,但仍然有同样的问题......
下面是我的代码:
title_list = df.Title.tolist()
model = SentenceTransformer('paraphrase-distilroberta-base-v1')
embeddings = model.encode(title_list)
embeddings_list = [elem for elem in embeddings_ex]
df_embed = df
df_embed['Embeddings'] = embeddings_list
df_embed.Embeddings = [np.asarray(x).astype('float32') for x in df_embed.Embeddings]
X = df_embed['Embeddings'].values
y = df_embed.Tags
mlb = MultiLabelBinarizer(classes=top_tags)
y_mlb = pd.DataFrame(mlb.fit_transform(y),columns=mlb.classes_, index=y.index)
from sklearn.model_selection import train_test_split
X_train, X_val, y_train, y_val = train_test_split(X, y_mlb, test_size = 0.3, random_state = 0)
X_val, X_test, y_val, y_test = train_test_split(X_val, y_val, test_size = 0.4, random_state = 0)
model = Sequential()
# Input - Layer
model.add(Dense(100, activation = "relu"))
# Hidden - Layers
model.add(Dropout(0.3, noise_shape=None, seed=None))
# Output- Layer
model.add(Dense(50, activation = "sigmoid"))
model.compile(loss='binary_crossentropy',
optimizer=Adam(0.01),
metrics=['accuracy'])
hist = model.fit(X_train, y_train, batch_size=8, epochs=10, validation_split=0.1)
我收到此错误:
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).