Question

我想做一个简单的文本分类。我尝试过不同的软件包，但是所有软件包都在内存中完成。

对于小输入，它的工作非常顺利，但输入越大，变得越慢。

"use strict";

const NaturalSynaptic = require("natural-synaptic");    

// After getting the data
rows = rows.map(c => (c.content && c.category_name ? {
    input: c.content
  , output: c.category_name
} : null)).filter(Boolean);

var classifier = new NaturalSynaptic();

// This part is relatively fast
rows.forEach((c, i) => {
    classifier.addDocument(c.input, c.output);
});

// It gets stuck here    
classifier.train();

培训结束后，我想使用classifier.classify('did the tests pass?')预测输出。

当它卡住时，其中一个CPU跳转到100％。我怀疑这是因为库中的for循环。

这样做的正确方法是什么？如何处理如此多的数据作为输入？

等了足够的时间之后，我就像我预料的那样结束了这个：

<--- Last few GCs --->

 1300704 ms: Mark-sweep 1194.3 (1458.1) -> 1194.3 (1458.1) MB, 238.2 / 0 ms [allocation failure] [scavenge might not succeed].
 1300955 ms: Mark-sweep 1194.3 (1458.1) -> 1194.3 (1458.1) MB, 251.7 / 0 ms [allocation failure] [scavenge might not succeed].
 1301199 ms: Mark-sweep 1194.3 (1458.1) -> 1194.3 (1458.1) MB, 244.0 / 0 ms [last resort gc].
 1301432 ms: Mark-sweep 1194.3 (1458.1) -> 1194.3 (1458.1) MB, 232.9 / 0 ms [last resort gc].


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x1326850e3ac1 <JS Object>
    2: textToFeatures [/home/ionicabizau/.../node_modules/natural/lib/natural/classifiers/classifier.js:~82] [pc=0x3204073474c8] (this=0xd98447d4ab1 <JS Object>,observation=0x2eb16ebfc7d9 <JS Array[36]>)
    3: train [/home/ionicabizau/.../node_modules/natural/lib/natural/classifiers/classifier.js:101] [pc=0x32040734600d]...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Aborted (core dumped)

如何在训练神经网络中处理大数据？

0 个答案: