如何为以下模型使用张量流构建嵌入层?

时间:2018-10-02 19:11:49

标签: tensorflow keras deep-learning tensorboard sentiment-analysis

这是一种用于情感分析的keras模型,我需要将其转换为tensorflow,我无法使用tensorflow并使用混淆矩阵来构建嵌入层来评估此模型?我问tf-learn是否和tensorflow一样

import os     
import numpy as np  
import pandas as pd  
import tensorflow as tf  
from tensorflow import set_random_seed
set_random_seed(2)
from nltk.tokenize import word_tokenize
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from keras.preprocessing.sequence import pad_sequences
from keras.preprocessing.text import Tokenizer
from sklearn.preprocessing import  LabelEncoder
from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers.embeddings import Embedding
from keras.layers import Flatten
from keras.layers import Conv1D, MaxPooling1D
from keras.layers import Dense,Activation
from keras.layers import Dropout
from keras.callbacks import TensorBoard, ModelCheckpoint
import re
import string
import collections
import time
seed = 10

读取CSV文件

df=pd.read_csv('tweets-pos-neg.csv', usecols = ['text','airline_sentiment'])
df = df.reindex(['text','airline_sentiment'], axis=1) #reorder columns
df=df.apply(lambda x: x.astype(str).str.lower()) 

标准化文本

def normalize(text):
    text= re.sub(r"http\S+", r'', text) 
    text= re.sub(r"@\S+", r'', text)
    punctuation = re.compile(r'[!"#$%&()*+,-./:;<=>?@[\]^_`{|}~|0-9]')
    text = re.sub(punctuation, ' ', text)
    text= re.sub(r'(.)\1\1+', r'\1', text) 
    return text

纯文本

def prepareDataSets(df):
     sentences=[]
     for index, r in df.iterrows():
         text= normalize(r['text'])
         sentences.append([text,r['airline_sentiment']])
          df_sentences=pd.DataFrame(sentences,columns= 
          ['text','airline_sentiment'])
     return df_sentences
edit_df=prepareDataSets(df)
edit_df=shuffle(edit_df)
X=edit_df.iloc[:,0]
Y=edit_df.iloc[:,1]

将评论拆分为令牌

 max_features = 50000
 tokenizer = Tokenizer(num_words=max_features, split=' ')
 tokenizer.fit_on_texts(X.values)
 #convert review tokens to integers
 X_seq = tokenizer.texts_to_sequences(X)

填充序列,可根据评论的最大长度将所有向量设为相同大小

seq_len=35
X_pad = pad_sequences(X_seq,maxlen=seq_len)   

将目标值从字符串转换为整数

le=LabelEncoder()
Y_le=le.fit_transform(Y)
Y_le_oh=to_categorical(Y_le)

训练测试拆分

X_train, X_test, Y_train, Y_test = train_test_split(X_pad,Y_le_oh, test_size 
= 0.33, random_state = 42)
X_train, X_Val, Y_train, Y_Val = train_test_split(X_train,Y_train, test_size 
= 0.1, random_state = 42)
print(X_train.shape,Y_train.shape)
print(X_test.shape,Y_test.shape)
print(X_Val.shape,Y_Val.shape) 

创建模型

embedding_vecor_length = 32    #no of vector columns
model_cnn = Sequential()
model_cnn.add(Embedding(max_features, embedding_vecor_length, 
input_length=seq_len))
model_cnn.add(Conv1D(filters=100, kernel_size=2, padding='valid', 
activation='relu', strides=1))
model_cnn.add(MaxPooling1D(2))
model_cnn.add(Flatten())
model_cnn.add(Dense(256, activation='relu'))
model_cnn.add(Dense(2, activation='softmax'))
opt=tf.keras.optimizers.Adam(lr=0.001, decay=1e-6)
model_cnn.compile(loss='binary_crossentropy', optimizer=opt, metrics= 
['accuracy'])
print(model_cnn.summary())  

评估模型

history=model_cnn.fit(X_train, Y_train, epochs=3, batch_size=32, callbacks=[tensorboard], validation_data=(X_Val, Y_Val))
scores = model_cnn.evaluate(X_test, Y_test, verbose=0)
print("Accuracy: %.2f%%" % (scores[-1]*100))

1 个答案:

答案 0 :(得分:0)

如果您只需要使用Tensorflow API进行训练/评估,则可以使用model_to_estimator函数来构建Estimator。

这里是documentation,带有示例。