我想做一个简单的文本分类。我尝试过不同的软件包,但是所有软件包都在内存中完成。
对于小输入,它的工作非常顺利,但输入越大,变得越慢。
"use strict";
const NaturalSynaptic = require("natural-synaptic");
// After getting the data
rows = rows.map(c => (c.content && c.category_name ? {
input: c.content
, output: c.category_name
} : null)).filter(Boolean);
var classifier = new NaturalSynaptic();
// This part is relatively fast
rows.forEach((c, i) => {
classifier.addDocument(c.input, c.output);
});
// It gets stuck here
classifier.train();
培训结束后,我想使用classifier.classify('did the tests pass?')
预测输出。
当它卡住时,其中一个CPU跳转到100%。我怀疑这是因为库中的for
循环。
这样做的正确方法是什么?如何处理如此多的数据作为输入?
等了足够的时间之后,我就像我预料的那样结束了这个:
<--- Last few GCs --->
1300704 ms: Mark-sweep 1194.3 (1458.1) -> 1194.3 (1458.1) MB, 238.2 / 0 ms [allocation failure] [scavenge might not succeed].
1300955 ms: Mark-sweep 1194.3 (1458.1) -> 1194.3 (1458.1) MB, 251.7 / 0 ms [allocation failure] [scavenge might not succeed].
1301199 ms: Mark-sweep 1194.3 (1458.1) -> 1194.3 (1458.1) MB, 244.0 / 0 ms [last resort gc].
1301432 ms: Mark-sweep 1194.3 (1458.1) -> 1194.3 (1458.1) MB, 232.9 / 0 ms [last resort gc].
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 0x1326850e3ac1 <JS Object>
2: textToFeatures [/home/ionicabizau/.../node_modules/natural/lib/natural/classifiers/classifier.js:~82] [pc=0x3204073474c8] (this=0xd98447d4ab1 <JS Object>,observation=0x2eb16ebfc7d9 <JS Array[36]>)
3: train [/home/ionicabizau/.../node_modules/natural/lib/natural/classifiers/classifier.js:101] [pc=0x32040734600d]...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Aborted (core dumped)