正如标题所述,我正在尝试在具有Python 3.6(conda_amazonei_mxnet_p36
环境)的AWS SageMaker Notebook实例上使用Turi Create。即使默认情况下已安装CUDA 10.0,CUDA 8.0也已预安装,可以在笔记本中使用以下命令进行选择:
!sudo rm /usr/local/cuda
!sudo ln -s /usr/local/cuda-8.0 /usr/local/cuda
我已使用nvcc --version
并通过以下方式验证了此安装:
$ cd /usr/local/cuda/samples/1_Utilities/deviceQuery
$ sudo make
$ ./deviceQuery
接下来,在我的笔记本中,安装Turi Create和适用于CUDA 8.0的正确版本的mxnet:
!pip install turicreate==5.4
!pip uninstall -y mxnet
!pip install mxnet-cu80==1.1.0
然后,我准备我的图像并尝试创建模型:
import turicreate as tc
tc.config.set_num_gpus(-1)
images = tc.image_analysis.load_images('images', ignore_failure=True);
data = images.join(annotations_);
train_data, test_data = data.random_split(0.8)
model = tc.object_detector.create(train_data, max_iterations=50)
运行tc.object_detector.create
时会输出以下内容
Using 'image' as feature column
Using 'annotaion' as annotations column
Downloading https://docs-assets.developer.apple.com/turicreate/models/darknet.params
Download completed: /var/tmp/model_cache/darknet.params
Setting 'batch_size' to 32
Using GPUs to create model (Tesla K80, Tesla K80, Tesla K80, Tesla K80, Tesla K80, Tesla K80, Tesla K80, Tesla K80)
Using default 16 lambda workers.
To maximize the degree of parallelism, add the following code to the beginning of the program:
"turicreate.config.set_runtime_config('TURI_DEFAULT_NUM_PYLAMBDA_WORKERS', 32)"
Note that increasing the degree of parallelism also increases the memory footprint.
---------------------------------------------------------------------------
MXNetError Traceback (most recent call last)
_ctypes/callbacks.c in 'calling callback function'()
~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/mxnet/kvstore.py in updater_handle(key, lhs_handle, rhs_handle, _)
81 lhs = _ndarray_cls(NDArrayHandle(lhs_handle))
82 rhs = _ndarray_cls(NDArrayHandle(rhs_handle))
---> 83 updater(key, lhs, rhs)
84 return updater_handle
85
~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/mxnet/optimizer/optimizer.py in __call__(self, index, grad, weight)
1528 self.sync_state_context(self.states[index], weight.context)
1529 self.states_synced[index] = True
-> 1530 self.optimizer.update_multi_precision(index, weight, grad, self.states[index])
1531
1532 def sync_state_context(self, state, context):
~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/mxnet/optimizer/optimizer.py in update_multi_precision(self, index, weight, grad, state)
553 use_multi_precision = self.multi_precision and weight.dtype == numpy.float16
554 self._update_impl(index, weight, grad, state,
--> 555 multi_precision=use_multi_precision)
556
557 @register
~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/mxnet/optimizer/optimizer.py in _update_impl(self, index, weight, grad, state, multi_precision)
535 if state is not None:
536 sgd_mom_update(weight, grad, state, out=weight,
--> 537 lazy_update=self.lazy_update, lr=lr, wd=wd, **kwargs)
538 else:
539 sgd_update(weight, grad, out=weight, lazy_update=self.lazy_update,
~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/mxnet/ndarray/register.py in sgd_mom_update(weight, grad, mom, lr, momentum, wd, rescale_grad, clip_gradient, out, name, **kwargs)
~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/mxnet/_ctypes/ndarray.py in _imperative_invoke(handle, ndargs, keys, vals, out)
90 c_str_array(keys),
91 c_str_array([str(s) for s in vals]),
---> 92 ctypes.byref(out_stypes)))
93
94 if original_output is not None:
~/anaconda3/envs/amazonei_mxnet_p36/lib/python3.6/site-packages/mxnet/base.py in check_call(ret)
144 """
145 if ret != 0:
--> 146 raise MXNetError(py_str(_LIB.MXGetLastError()))
147
148
MXNetError: Cannot find argument 'lazy_update', Possible Arguments:
----------------
lr : float, required
Learning rate
momentum : float, optional, default=0
The decay rate of momentum estimates at each epoch.
wd : float, optional, default=0
Weight decay augments the objective function with a regularization term that penalizes large weights. The penalty scales with the square of the magnitude of each weight.
rescale_grad : float, optional, default=1
Rescale gradient to grad = rescale_grad*grad.
clip_gradient : float, optional, default=-1
Clip gradient to the range of [-clip_gradient, clip_gradient] If clip_gradient <= 0, gradient clipping is turned off. grad = max(min(grad, clip_gradient), -clip_gradient).
, in operator sgd_mom_update(name="", wd="0.0005", momentum="0.9", clip_gradient="0.025", rescale_grad="1.0", lr="0.001", lazy_update="True")
有趣的是,如果我使用CUDA 10.0代替Turi Create 5.6:
!pip install turicreate==5.6
!pip uninstall -y mxnet
!pip install mxnet-cu100==1.4.0.post0
笔记本仍然失败,但是如果我立即卸载turicreate
和mxnet-cu100
并再次对CUDA 8.0尝试上述步骤,它将正常工作。
上次它起作用了,在重新启动实例后,我尝试了pip freeze > requirements.txt
,然后尝试了pip install -r requirements.txt
,但是仍然遇到与上面相同的错误(除非我先尝试使用CUDA 10.0)。这里发生了什么?任何建议表示赞赏。
答案 0 :(得分:0)
您从mxnet 1.1.0到1.4.0的更新是正确的修复。看来该错误与CUDA版本无关,而与MXNet本身有关。
mxnet 1.1.0的https://github.com/apache/incubator-mxnet源代码没有 "name": "discord-bot",
"version": "1.0.0",
"description": "Custom bot for Chinese Discord server.",
"main": "index.js",
"scripts": {
"start": "node index.js",
"dev": "nodemon index.js",
"build": "next build"
},
"keywords": [],
"author": "Jacob Villorente",
"license": "ISC",
"dependencies": {
"discord.js": "^11.4.2",
"discord.js-commando": "^0.10.0",
"dotenv": "^7.0.0",
"express": "^4.17.1",
"node-fetch": "^2.6.0",
"ytdl-core": "^1.0.3"
},
"devDependencies": {
"nodemon": "^1.18.11"
}
}
函数的lazy_update
参数。
您可以通过比较mxnet发行标签1.4.0的优化程序代码中的sgd_mom_update
函数调用来观察到这一点
带有mxnet发行标签1.1.0的优化程序代码
这些更改包含在sgd_mom_update
中,这就是为什么您在mxnet>=1.3.0
上测试成功的原因。