Question

我使用Mallet训练了一个maxent文档分类模型，结果证明它是130MB，对于我希望运行它的实例来说太大了。我想知道是否有办法可能减少模型的词汇量，从而减少整体模型的大小。这样做有管吗？我目前使用的管道是

 Pipe instancePipe = new SerialPipes(new Pipe[]{
            new Target2Label(),                     //creates labels
            new Input2CharSequence("UTF-8"),        //read the file as string
            new CharSequence2TokenSequence(),       //tokenize the string
            new TokenSequenceLowercase(),           //lowercase the tokens
            new TokenSequenceRemoveStopwords(false),  //remove stopwords
            new TokenSequence2FeatureSequence(),    //convert tokens to features
            new FeatureSequence2FeatureVector(),    //create feature vector
            //new PrintInputAndTarget()  //print everything
    });

减少模型尺寸的任何其他提示也会有所帮助

Answer 1

最简单的方法是在初始导入后尝试修剪词汇表。使用

using System;
using NUnit.Framework;

namespace TestsProject.StepDefinitions
{
    [SetUpFixture]
    public class NUnitSetupFixture
    {
        [OneTimeSetUp]
        public void RunBeforeAnyTests()
        {
            //throw new Exception("This is called.");
        }

        [OneTimeTearDown]
        public void RunAfterAnyTests()
        {
        }
    }
}

查看选项。

槌文档分类 - 减少词汇量

1 个答案: