要制作Ordinal Classification,我必须使用scikit-learn's clone
。
但clone
方法会造成巨大的内存浪费!
如何仅针对每个模型的数据部分释放内存?
以下是示例代码。
import gc
import os
import sys
import psutil
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.base import clone
p = psutil.Process(os.getpid())
X = np.random.normal(size=(10000, 50))
Y = np.floor(np.random.rand(10000)*10)
def print_mem():
print("{:.0f}MB".format(p.memory_info().rss / 1e6))
print_mem()
my_model = RandomForestClassifier(n_estimators=1000, max_features=1, n_jobs=28)
models = []
for i in np.sort(np.array(list(set(Y)))):
if i != max(set(Y)):
model = clone(my_model)
model.fit(X, Y > i)
models.append(model)
gc.collect()
print_mem()
print(sys.getsizeof(models[0])/1e3, " KB / model")
输出:
102MB
313MB
602MB
953MB
1346MB
1752MB
2139MB
2488MB
2708MB
2872MB
0.056 KB / model
系统信息:
Linux-4.10.0-28-generic-x86_64-with-debian-stretch-sid
Python 3.6.3 |Anaconda, Inc.| (default, Oct 13 2017, 12:02:49)
[GCC 7.2.0]
NumPy 1.13.3
SciPy 0.19.1
Scikit-Learn 0.19.1