我正在尝试使用tensorboardX调试在AWS的p2.xlarge实例中运行的pytorch NN。
我跟随this tutorial打开端口6006。
该模型正在运行,并且tensorboardX正在创建其writer文件。我在那里收到以下警告。我不确定它有多重要。
警告:root:tuple出现在不转发元组的op中 (位于/pytorch/torch/csrc/jit/passes/lower_tuples.cpp:117的VisitNode) 帧#0:std :: function :: operator()()const + 0x11 (0x7fbe3dd04441 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libc10.so) 框架#1:c10 :: Error :: Error(c10 :: SourceLocation,std :: string const&)+ 0x2a(0x7fbe3dd03d7a in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libc10.so) 帧#2:+ 0xaf61f5(0x7fbe3cdc41f5 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1) 帧#3:+ 0xaf6464(0x7fbe3cdc4464 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1) 框架4: 火炬:: jit :: LowerAllTuples(std :: shared_ptr&)+ 0x13 (0x7fbe3cdc44a3 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1) 帧#5:+ 0x3f84b4(0x7fbe7d2cb4b4 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch_python.so) 帧#6:+ 0x130cfc(0x7fbe7d003cfc in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch_python.so) 框架40:__ libc_start_main + 0xf0 (/lib/x86_64-linux-gnu/libc.so.6中的0x7fbe8d69c830)
警告:root:tuple出现在不转发元组的op中 (位于/pytorch/torch/csrc/jit/passes/lower_tuples.cpp:117的VisitNode) 帧#0:std :: function :: operator()()const + 0x11 (0x7fbe3dd04441 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libc10.so) 框架#1:c10 :: Error :: Error(c10 :: SourceLocation,std :: string const&)+ 0x2a(0x7fbe3dd03d7a in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libc10.so) 帧#2:+ 0xaf61f5(0x7fbe3cdc41f5 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1) 帧#3:+ 0xaf6464(0x7fbe3cdc4464 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1) 框架4: 火炬:: jit :: LowerAllTuples(std :: shared_ptr&)+ 0x13 (0x7fbe3cdc44a3 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1) 帧#5:+ 0x3f84b4(0x7fbe7d2cb4b4 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch_python.so) 帧#6:+ 0x130cfc(0x7fbe7d003cfc in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch_python.so) 框架40:__ libc_start_main + 0xf0 (/lib/x86_64-linux-gnu/libc.so.6中的0x7fbe8d69c830)
问题是我无法访问tensorboard浏览器用户界面。我采取以下步骤:
$ cd PATH_TO_FOLDER_CONTAINING_runs
$ source activate pytorch_p36
$ tensorboard --logdir=runs
我在哪里收到错误消息:
分段错误(核心已转储)
当我检查系统日志var/log/syslog
时,我看到以下内容:
6月26日09:06:40 ip-172-xx-xx-xxx内核:[515315.598917] Tensorboard [1446]:segfault at 0 ip(null)sp 00007ffd64c5f178 python2.7中的错误14 [55d8673d1000 + 1000] < / p>
我的谷歌搜索技能还远远不够。在ASW实例中运行时,如何通过浏览器访问tensorboard?
请让我知道不清楚的地方或缺少的信息。
答案 0 :(得分:0)
即使代码必须在pytorch_p36环境中运行,张量板实际上也必须在其他环境中运行。
终端中的命令顺序应为:
$ cd PATH_TO_FOLDER_CONTAINING_runs
$ source activate tensorflow_p27
$ tensorboard --logdir=runs
然后打开指定的端口。