LSTM模型权重来训练用于文本分类的数据

时间:2018-11-26 07:45:12

标签: keras lstm text-classification

我使用Keras建立了用于文本分类的LSTM模型。现在我有新数据要训练。我想到了使用模型权重训练数据,而不是附加到原始数据并重新训练模型。即进行权重训练以使用新数据。 但是,无论我训练的数量如何,该模型都无法预测正确的分类(即使我给出相同的句子进行预测)。可能是什么原因? 请帮助我。

1 个答案:

答案 0 :(得分:0)

您是否使用以下内容保存经过训练的模型?

from keras.models import load_model
model = load_model('model.h5') # Load the architecture
model = model.load_weights('model_weights.h5') # Set the weights

# train on new data
model.compile...
model.fit...

然后加载以下内容?

#split into individual docs
text.s = strsplit(text, "\n(?=#\\*)", perl = T)[[1]]

# function to extract information from individual docs
extract_info = function(x, patterns = list(title="^*#\\*", 
                                           autors="^*#@",
                                           year="^*#t",
                                           revue="^*#c",
                                           id_paper="^*#index",
                                           id_ref="^*#%",
                                           abstract="^*#!")) {
  lapply(patterns, function(p) {
    extract = grep(p, x, value = T)
    # here you check the length of the potential output
    # and modify the type according to your needs
    if (length(extract) > 1) {
     extract = list(extract)
    } else if (length(extract) == 0) {
     extract = NA
    }
    return(extract)
    })
}

# apply the function to the data
# and rbind it into a data.frame
do.call(rbind, 
        lapply(text.s, function(x) {
  x = strsplit(x, "\\n")[[1]]
  extract_info(x)
})
)

# title                         autors                                        year     revue id_paper   id_ref
# [1,] "#*TeX: The Program"          "#@Donald E. Knuth"                           "#t1986" "#c"  "#index68" NA    
# [2,] "#*Foundations of Databases." "#@Serge Abiteboul,Richard Hull,Victor Vianu" "#t1995" "#c"  "#index69" List,1
# abstract                                                                                                         
# [1,] NA                                                                                                               
# [2,] "#!From the Book: This book will teach you how to write specifications of computer systems, using th" [truncated]

加载的模型与此处保存的模型完全相同。如果您这样做的话,那么数据中肯定会有一些不同的东西(与经过训练的数据相比)。