在我的项目中,我使用Python库gensim进行主题建模/文本提取。 我尝试加载训练有素的LdaMallet模型对新的看不见的文本进行分类。
第一部分是加载模型。
<?php
$db=Db::getConnection(); // singleton
$this->var['result']=$db->query($_POST['query']);
if (isset($this->var['result'])) {
echo '<table cellpadding="5" cellspacing="2" border="1"><tbody>';
$titles=array_keys(@$this->var['result'][0]);
echo '<tr>';
foreach($titles as $title)
echo '<th>'.$title.'</th>';
echo '</tr>';
foreach($this->var['result'] as $row) {
echo '<tr>';
foreach($row as $cell) {
echo '<td>'.$cell.'</td>';
}
echo '</tr>';
}
echo '</tbody></table>';
}
?>
我不确定将ldaMallet转换为LdaModel的最后一行。这是获得结果的唯一方法。
然后第二部分是准备新数据并对其进行分类。
$sql = $this->mysqli->query($query);
$this->lastInsertId = $this->mysqli->insert_id;
$errno = $this->mysqli->connect_errno;
if ($errno === 0) {
if ($this->mysqli->field_count > 0) {
while ($row = $sql->fetch_assoc()) $response[] = $row;
return $response;
}else{
return true;
}
}
然后结果看起来像这样:
import os
dirname = os.path.dirname(__file__)
filename = os.path.join(dirname, 'mallet-2.0.8/bin/mallet')
# Download File: http://mallet.cs.umass.edu/dist/mallet-2.0.8.zip
os.environ['MALLET_HOME'] = # path to mallet
ldaMallet = gensim.models.wrappers.LdaMallet.load('lda_malletoutputCommentsAndMethods.model)
ldaModel = gensim.models.wrappers.ldamallet.malletmodel2ldamodel(ldaMallet)
无论我在from gensim.test.utils import common_dictionary
other_texts = [['new', 'document', 'to', 'classify', 'as', 'array']]
other_corpus = [common_dictionary.doc2bow(text) for text in other_texts]
vector = ldaModel[other_corpus[0]]
# sorts the result by probability and not by topic ID
print(sorted(vector, key=lambda x: x[1], reverse=True))
数组中使用哪种文本,此结果都不会改变,但是应该改变。