使用csv进行的情感分析包含45k且带有两个cols [text,sentiment],试图将sigmoid与binary_crossentropy一起使用,但它返回错误:
检查目标时出错:预期density_2具有形状(1,),但 得到形状为(2,)的阵列
我尝试使用LabelEncoder,但是它的返回值,输入形状不好,我该如何让Sigmond 1密集的编码标签接受?
#I do aspire here to have balanced classes
num_of_categories = 45247
shuffled = data.reindex(np.random.permutation(data.index))
e = shuffled[shuffled['sentiment'] == 'POS'][:num_of_categories]
b = shuffled[shuffled['sentiment'] == 'NEG'][:num_of_categories]
concated = pd.concat([e,b], ignore_index=True)
for idx,row in data.iterrows():
row[0] = row[0].replace('rt',' ')
#Shuffle the dataset
concated = concated.reindex(np.random.permutation(concated.index))
concated['LABEL'] = 0
#encode the lab
encoder = LabelEncoder()
concated.loc[concated['sentiment'] == 'POS', 'LABEL'] = 0
concated.loc[concated['sentiment'] == 'NEG', 'LABEL'] = 1
print(concated['LABEL'][:10])
labels = encoder.fit_transform(concated)
print(labels[:10])
if 'sentiment' in concated.keys():
concated.drop(['sentiment'], axis=1)
n_most_common_words = 8000
max_len = 130
tokenizer = Tokenizer(num_words=n_most_common_words, filters='!"#$%&()*+,-./:;<=>?@[\]^_`{|}~', lower=True)
tokenizer.fit_on_texts(concated['text'].values)
sequences = tokenizer.texts_to_sequences(concated['text'].values)
word_index = tokenizer.word_index
答案 0 :(得分:1)
LabelEncoder
的输出也为1暗,我想您网络的输出为2暗。因此,您需要对y_true进行一次分析。
使用
labels = keras.utils.to_categorical(concated['LABEL'], num_classes=2)
代替
labels = encoder.fit_transform(concated)