Mallet:OutOfMemoryError:Java堆空间

时间:2017-06-22 03:20:12

标签: java machine-learning out-of-memory translation mallet

在Mallet中训练数据时,处理因OutOfMemoryError而停止。 bin / mallet中的属性MEMORY已设置为3GB。训练文件output.mallet的大小仅为31 MB。我试图减少训练数据的大小。但它仍然会抛出同样的错误:

a161115@a161115-Inspiron-3250:~/dev/test_models/Mallet$ bin/mallet train-classifier --input output.mallet --trainer NaiveBayes --training-portion 0.0001 --num-trials 10
Training portion = 1.0E-4
Unlabeled training sub-portion = 0.0
Validation portion = 0.0
Testing portion = 0.9999

-------------------- Trial 0  --------------------

Trial 0 Training NaiveBayesTrainer with 7 instances
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at cc.mallet.types.Multinomial$Estimator.setAlphabet(Multinomial.java:309)
        at cc.mallet.classify.NaiveBayesTrainer.setup(NaiveBayesTrainer.java:251)
        at cc.mallet.classify.NaiveBayesTrainer.trainIncremental(NaiveBayesTrainer.java:200)
        at cc.mallet.classify.NaiveBayesTrainer.train(NaiveBayesTrainer.java:193)
        at cc.mallet.classify.NaiveBayesTrainer.train(NaiveBayesTrainer.java:59)
        at cc.mallet.classify.tui.Vectors2Classify.main(Vectors2Classify.java:415)

我会对这个问题提出任何帮助或见解

编辑:这是我的bin / mallet文件。

#!/bin/bash


malletdir=`dirname $0`
malletdir=`dirname $malletdir`

cp=$malletdir/class:$malletdir/lib/mallet-deps.jar:$CLASSPATH
#echo $cp

MEMORY=10g

CMD=$1
shift

help()
{
cat <<EOF
Mallet 2.0 commands: 

  import-dir         load the contents of a directory into mallet instances (one per file)
  import-file        load a single file into mallet instances (one per line)
  import-svmlight    load SVMLight format data files into Mallet instances
  info               get information about Mallet instances
  train-classifier   train a classifier from Mallet data files
  classify-dir       classify data from a single file with a saved classifier
  classify-file      classify the contents of a directory with a saved classifier
  classify-svmlight  classify data from a single file in SVMLight format
  train-topics       train a topic model from Mallet data files
  infer-topics       use a trained topic model to infer topics for new documents
  evaluate-topics    estimate the probability of new documents under a trained model
  prune              remove features based on frequency or information gain
  split              divide data into testing, training, and validation portions
  bulk-load          for big input files, efficiently prune vocabulary and import docs

Include --help with any option for more information
EOF
}

CLASS=

case $CMD in
        import-dir) CLASS=cc.mallet.classify.tui.Text2Vectors;;
        import-file) CLASS=cc.mallet.classify.tui.Csv2Vectors;;
        import-svmlight) CLASS=cc.mallet.classify.tui.SvmLight2Vectors;;
        info) CLASS=cc.mallet.classify.tui.Vectors2Info;;
        train-classifier) CLASS=cc.mallet.classify.tui.Vectors2Classify;;
        classify-dir) CLASS=cc.mallet.classify.tui.Text2Classify;;
        classify-file) CLASS=cc.mallet.classify.tui.Csv2Classify;;
        classify-svmlight) CLASS=cc.mallet.classify.tui.SvmLight2Classify;;
        train-topics) CLASS=cc.mallet.topics.tui.TopicTrainer;;
        infer-topics) CLASS=cc.mallet.topics.tui.InferTopics;;
        evaluate-topics) CLASS=cc.mallet.topics.tui.EvaluateTopics;;
        prune) CLASS=cc.mallet.classify.tui.Vectors2Vectors;;
        split) CLASS=cc.mallet.classify.tui.Vectors2Vectors;;
        bulk-load) CLASS=cc.mallet.util.BulkLoader;;
        run) CLASS=$1; shift;;
        *) echo "Unrecognized command: $CMD"; help; exit 1;;
esac

java -Xmx$MEMORY -ea -Djava.awt.headless=true -Dfile.encoding=UTF-8 -server -classpath "$cp" $CLASS "$@"

还值得一提的是,我的原始培训文件有60,000项。当我减少项目数(20,000个实例)时,训练将像平常一样运行,但使用大约10GB RAM。

2 个答案:

答案 0 :(得分:1)

检查bin / mallet中对Java的调用并添加标志-Xmx3g,确保其中没有其他Xmx;如果是的话,编辑那个)。

答案 1 :(得分:0)

我通常会同时更改两个文件:槌文件,并将内存设置为最大

/**
 * BLOCK: Book Metadata
 *
 * Registering a dynamic block with Gutenberg.
 * Renders a block with to store metadata about a book in wp_postmeta table.
 */

//  Import CSS.
import './style.scss';
import './editor.scss';

const { __ } = wp.i18n; // Import __() from wp.i18n
const { registerBlockType } = wp.blocks; // Import registerBlockType() from wp.blocks
const {
    PlainText,
    InspectorControls,
    BlockControls,
} = wp.editor;
const {
    PanelBody,
    TextareaControl,
    TextControl,
    Dashicon,
    Toolbar,
    Button,
    Tooltip,
} = wp.components;

registerBlockType( 'book-list/book-metadata-block', {
    title: __( 'About Book' ),
    icon: 'book',
    category: 'common',
    keywords: [
        __( 'Book' ),
        __( 'Book information' )
    ],
    attributes: {
        author: {
            type: 'string',
            source: 'meta',
            meta: 'book_author',
        },
        publisher: {
            type: 'string',
            source: 'meta',
            meta: 'book_publisher',
        },
        synopsis: {
            type: 'string',
            source: 'meta',
            meta: 'book_synopsis',
        },
        language: {
            type: 'string',
            source: 'meta',
            meta: 'book_language',
        },
        pub_year: {
            type: 'string',
            source: 'meta',
            meta: 'book_year',
        },

        price: {
            type: 'string',
            source: 'meta',
            meta: 'book_price',
        },

        discount: {
            type: 'string',
            source: 'meta',
            meta: 'book_discount',
        },

        pages: {
            type: 'string',
            source: 'meta',
            meta: 'book_pages',
        }
    },
    edit: function( props ) {
        function onAuthorChange( thisValue ) {
            props.setAttributes( { author: thisValue } );
        }

        function onPublisherChange( thisValue ) {
            props.setAttributes( { publisher: thisValue } );
        }

        function onSynopsisChange( thisValue ) {
            props.setAttributes( { synopsis: thisValue } );
        }

        function onLanguageChange( thisValue ) {
            props.setAttributes( { language: thisValue } );
        }

        function onYearChange( thisValue ) {
            props.setAttributes( { pub_year: thisValue } );
        }

        function onPriceChange( thisValue ) {
            props.setAttributes( { price: thisValue } );
        }

        function onDiscountChange( thisValue ) {
            props.setAttributes( { discount: thisValue } );
        }

        function onPagesChange( thisValue ) {
            props.setAttributes( { pages: thisValue } );
        }


        return(
            <div id="book-metadata">

                <InspectorControls key="inspector">
                    <PanelBody title={ __('Pricing') }>
                        <TextControl
                            label={ __( 'Printed Price' ) }
                            placeholder={ __( 'Original Price' ) }
                            value={ props.attributes.price }
                            onChange={ onPriceChange }
                            maxLength="15"
                        />
                        <TextControl
                            label={ __( 'Discount (%)' ) }
                            placeholder={ __( 'Discount in percent' ) }
                            value={ props.attributes.discount }
                            onChange={ onDiscountChange }
                            maxLength="6"
                        />
                    </PanelBody>
                    <PanelBody title={ __('Publication') }>
                        <TextControl
                            label={ __( 'Publisher' ) }
                            value={ props.attributes.publisher }
                            onChange={ onPublisherChange }
                            maxLength="35"
                        />
                        <TextControl
                            label={ __( 'Year of Publication' ) }
                            value={ props.attributes.pub_year }
                            onChange={ onYearChange }
                            maxLength="4"
                        />
                    </PanelBody>
                    <PanelBody title={ __('Language and others') }>
                        <TextControl
                            label={ __( 'Language' ) }
                            value={ props.attributes.language }
                            onChange={ onLanguageChange }
                            maxLength="35"
                        />
                        <TextControl
                            label={ __( 'No. of Pages' ) }
                            value={ props.attributes.pages }
                            onChange={ onPagesChange }
                            maxLength="4"
                        />
                    </PanelBody>
                </InspectorControls>

                <h3>About the book</h3>
                <div className="book-author">
                    <label>Author</label>
                    <PlainText
                        placeholder={ __( 'Book author' ) }
                        value={ props.attributes.author || '' }
                        onChange={ onAuthorChange }
                        maxLength="50"
                    />
                </div>
                <div className="book-synopsis">
                    <label>Brief</label>
                    <PlainText
                        placeholder={ __( 'A brief about this book within 1000 characters' ) }
                        value={ props.attributes.synopsis || '' }
                        onChange={ onSynopsisChange }
                        aria-multiline="true"
                        rows="6"
                        columns="30"
                        maxLength="1000"
                    />
                </div>
            </div>
        );
    },
    save: function( props ) {
        return null;
    }
} );

Mallet.batjava -Xmx%MALLET_MEMORY% -ea -Dfile.encoding=%MALLET_ENCODING% -classpath %MALLET_CLASSPATH% %CLASS% %MALLET_ARGS%

我用想要的内存替换了粗体的%MALLET_MEMORY%和$ MEMORY: 4G