在Mallet中训练数据时,处理因OutOfMemoryError
而停止。 bin / mallet中的属性MEMORY
已设置为3GB。训练文件output.mallet的大小仅为31 MB。我试图减少训练数据的大小。但它仍然会抛出同样的错误:
a161115@a161115-Inspiron-3250:~/dev/test_models/Mallet$ bin/mallet train-classifier --input output.mallet --trainer NaiveBayes --training-portion 0.0001 --num-trials 10
Training portion = 1.0E-4
Unlabeled training sub-portion = 0.0
Validation portion = 0.0
Testing portion = 0.9999
-------------------- Trial 0 --------------------
Trial 0 Training NaiveBayesTrainer with 7 instances
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at cc.mallet.types.Multinomial$Estimator.setAlphabet(Multinomial.java:309)
at cc.mallet.classify.NaiveBayesTrainer.setup(NaiveBayesTrainer.java:251)
at cc.mallet.classify.NaiveBayesTrainer.trainIncremental(NaiveBayesTrainer.java:200)
at cc.mallet.classify.NaiveBayesTrainer.train(NaiveBayesTrainer.java:193)
at cc.mallet.classify.NaiveBayesTrainer.train(NaiveBayesTrainer.java:59)
at cc.mallet.classify.tui.Vectors2Classify.main(Vectors2Classify.java:415)
我会对这个问题提出任何帮助或见解
编辑:这是我的bin / mallet文件。
#!/bin/bash
malletdir=`dirname $0`
malletdir=`dirname $malletdir`
cp=$malletdir/class:$malletdir/lib/mallet-deps.jar:$CLASSPATH
#echo $cp
MEMORY=10g
CMD=$1
shift
help()
{
cat <<EOF
Mallet 2.0 commands:
import-dir load the contents of a directory into mallet instances (one per file)
import-file load a single file into mallet instances (one per line)
import-svmlight load SVMLight format data files into Mallet instances
info get information about Mallet instances
train-classifier train a classifier from Mallet data files
classify-dir classify data from a single file with a saved classifier
classify-file classify the contents of a directory with a saved classifier
classify-svmlight classify data from a single file in SVMLight format
train-topics train a topic model from Mallet data files
infer-topics use a trained topic model to infer topics for new documents
evaluate-topics estimate the probability of new documents under a trained model
prune remove features based on frequency or information gain
split divide data into testing, training, and validation portions
bulk-load for big input files, efficiently prune vocabulary and import docs
Include --help with any option for more information
EOF
}
CLASS=
case $CMD in
import-dir) CLASS=cc.mallet.classify.tui.Text2Vectors;;
import-file) CLASS=cc.mallet.classify.tui.Csv2Vectors;;
import-svmlight) CLASS=cc.mallet.classify.tui.SvmLight2Vectors;;
info) CLASS=cc.mallet.classify.tui.Vectors2Info;;
train-classifier) CLASS=cc.mallet.classify.tui.Vectors2Classify;;
classify-dir) CLASS=cc.mallet.classify.tui.Text2Classify;;
classify-file) CLASS=cc.mallet.classify.tui.Csv2Classify;;
classify-svmlight) CLASS=cc.mallet.classify.tui.SvmLight2Classify;;
train-topics) CLASS=cc.mallet.topics.tui.TopicTrainer;;
infer-topics) CLASS=cc.mallet.topics.tui.InferTopics;;
evaluate-topics) CLASS=cc.mallet.topics.tui.EvaluateTopics;;
prune) CLASS=cc.mallet.classify.tui.Vectors2Vectors;;
split) CLASS=cc.mallet.classify.tui.Vectors2Vectors;;
bulk-load) CLASS=cc.mallet.util.BulkLoader;;
run) CLASS=$1; shift;;
*) echo "Unrecognized command: $CMD"; help; exit 1;;
esac
java -Xmx$MEMORY -ea -Djava.awt.headless=true -Dfile.encoding=UTF-8 -server -classpath "$cp" $CLASS "$@"
还值得一提的是,我的原始培训文件有60,000项。当我减少项目数(20,000个实例)时,训练将像平常一样运行,但使用大约10GB RAM。
答案 0 :(得分:1)
检查bin / mallet中对Java的调用并添加标志-Xmx3g,确保其中没有其他Xmx;如果是的话,编辑那个)。
答案 1 :(得分:0)
我通常会同时更改两个文件:槌文件,并将内存设置为最大
/**
* BLOCK: Book Metadata
*
* Registering a dynamic block with Gutenberg.
* Renders a block with to store metadata about a book in wp_postmeta table.
*/
// Import CSS.
import './style.scss';
import './editor.scss';
const { __ } = wp.i18n; // Import __() from wp.i18n
const { registerBlockType } = wp.blocks; // Import registerBlockType() from wp.blocks
const {
PlainText,
InspectorControls,
BlockControls,
} = wp.editor;
const {
PanelBody,
TextareaControl,
TextControl,
Dashicon,
Toolbar,
Button,
Tooltip,
} = wp.components;
registerBlockType( 'book-list/book-metadata-block', {
title: __( 'About Book' ),
icon: 'book',
category: 'common',
keywords: [
__( 'Book' ),
__( 'Book information' )
],
attributes: {
author: {
type: 'string',
source: 'meta',
meta: 'book_author',
},
publisher: {
type: 'string',
source: 'meta',
meta: 'book_publisher',
},
synopsis: {
type: 'string',
source: 'meta',
meta: 'book_synopsis',
},
language: {
type: 'string',
source: 'meta',
meta: 'book_language',
},
pub_year: {
type: 'string',
source: 'meta',
meta: 'book_year',
},
price: {
type: 'string',
source: 'meta',
meta: 'book_price',
},
discount: {
type: 'string',
source: 'meta',
meta: 'book_discount',
},
pages: {
type: 'string',
source: 'meta',
meta: 'book_pages',
}
},
edit: function( props ) {
function onAuthorChange( thisValue ) {
props.setAttributes( { author: thisValue } );
}
function onPublisherChange( thisValue ) {
props.setAttributes( { publisher: thisValue } );
}
function onSynopsisChange( thisValue ) {
props.setAttributes( { synopsis: thisValue } );
}
function onLanguageChange( thisValue ) {
props.setAttributes( { language: thisValue } );
}
function onYearChange( thisValue ) {
props.setAttributes( { pub_year: thisValue } );
}
function onPriceChange( thisValue ) {
props.setAttributes( { price: thisValue } );
}
function onDiscountChange( thisValue ) {
props.setAttributes( { discount: thisValue } );
}
function onPagesChange( thisValue ) {
props.setAttributes( { pages: thisValue } );
}
return(
<div id="book-metadata">
<InspectorControls key="inspector">
<PanelBody title={ __('Pricing') }>
<TextControl
label={ __( 'Printed Price' ) }
placeholder={ __( 'Original Price' ) }
value={ props.attributes.price }
onChange={ onPriceChange }
maxLength="15"
/>
<TextControl
label={ __( 'Discount (%)' ) }
placeholder={ __( 'Discount in percent' ) }
value={ props.attributes.discount }
onChange={ onDiscountChange }
maxLength="6"
/>
</PanelBody>
<PanelBody title={ __('Publication') }>
<TextControl
label={ __( 'Publisher' ) }
value={ props.attributes.publisher }
onChange={ onPublisherChange }
maxLength="35"
/>
<TextControl
label={ __( 'Year of Publication' ) }
value={ props.attributes.pub_year }
onChange={ onYearChange }
maxLength="4"
/>
</PanelBody>
<PanelBody title={ __('Language and others') }>
<TextControl
label={ __( 'Language' ) }
value={ props.attributes.language }
onChange={ onLanguageChange }
maxLength="35"
/>
<TextControl
label={ __( 'No. of Pages' ) }
value={ props.attributes.pages }
onChange={ onPagesChange }
maxLength="4"
/>
</PanelBody>
</InspectorControls>
<h3>About the book</h3>
<div className="book-author">
<label>Author</label>
<PlainText
placeholder={ __( 'Book author' ) }
value={ props.attributes.author || '' }
onChange={ onAuthorChange }
maxLength="50"
/>
</div>
<div className="book-synopsis">
<label>Brief</label>
<PlainText
placeholder={ __( 'A brief about this book within 1000 characters' ) }
value={ props.attributes.synopsis || '' }
onChange={ onSynopsisChange }
aria-multiline="true"
rows="6"
columns="30"
maxLength="1000"
/>
</div>
</div>
);
},
save: function( props ) {
return null;
}
} );
和
Mallet.batjava -Xmx%MALLET_MEMORY% -ea -Dfile.encoding=%MALLET_ENCODING% -classpath %MALLET_CLASSPATH% %CLASS% %MALLET_ARGS%
我用想要的内存替换了粗体的%MALLET_MEMORY%和$ MEMORY: 4G