Azure ML软件包启动的Tensorboard无法正常工作

时间:2019-08-21 10:08:18

标签: python azure azure-machine-learning-service

我想访问在培训期间创建并存储在Azure ML服务的日志中的tfevent文件。可以访问此tfevent文件并将其正确显示在普通的tensorboard上,因此该文件没有损坏,但是当我使用Azure ML的tensorboard库访问该文件时,本地tensorboard上没有任何显示或拒绝连接。

我首先将其登录到./logs/tensorboard,就像Azure ML拥有./logs/azureml,但是由Azure ML的模块启动的tensorboard表示,浏览器下面没有可显示如下文件。

No dashboards are active for the current data set.
Probable causes:

You haven’t written any data to your event files.
TensorBoard can’t find your event files.
If you’re new to using TensorBoard, and want to find out how to add data and set up your event files, check out the README and perhaps the TensorBoard tutorial.
If you think TensorBoard is configured properly, please see the section of the README devoted to missing data problems and consider filing an issue on GitHub.

Last reload: Wed Aug 21 2019 *****
Data location: /tmp/tmpkfj7gswu

因此,我认为AML无法识别已保存的位置,因此将保存位置更改为./logs,然后浏览器显示“无法访问此网站。******拒绝连接。” < / p>

我的Azure ML Python SDK版本是1.0.57

1)我该如何解决?

2)我应该在哪里保存tfevent文件以使AML能够识别它?我在这里的文档中找不到有关它的任何信息。 https://docs.microsoft.com/en-us/python/api/azureml-tensorboard/azureml.tensorboard.tensorboard?view=azure-ml-py

这就是我通过Azure ML启动tensorboard的方式。

if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description=f'This script is to lanuch TensorBoard with '
        f'accessing run history from machine learning '
        f'experiments that output Tensorboard logs')
    parser.add_argument('--experiment-name',
                        dest='experiment_name',
                        type=str,
                        help='experiment name in Azure ML')
    parser.add_argument('--run-id',
                        dest='run_id',
                        type=str,
                        help='The filename of merged json file.')

    args = parser.parse_args()

    logger = get_logger(__name__)
    logger.info(f'SDK Version: {VERSION}')

    workspace = get_workspace()
    experiment_name = args.experiment_name
    run_id = args.run_id
    experiment = get_experiment(experiment_name, workspace, logger)
    run = get_run(experiment, run_id)

    # The Tensorboard constructor takes an array of runs, so pass it in as a single-element array here
    tb = Tensorboard([run])

    # If successful, start() returns a string with the URI of the instance.
    url = tb.start()
    print(url)

1 个答案:

答案 0 :(得分:1)

AzureML中的Tensorboard支持的设计方式如下:

  1. 您在AMLCluster或连接的VM上训练模型,然后将Tensorboard日志文件写入<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js"></script> <ul id="datalist"> <li>dataset1</li> <li>dataset1</li> <li class="displayNone">dataset2</li> <li class="displayNone">dataset2</li> <li class="displayNone">dataset3</li> <li class="displayNone">dataset3</li> <li class="displayNone">dataset4</li> <li class="displayNone">dataset4</li> <li class="displayNone">dataset5</li> <li class="displayNone">dataset5</li> </ul> <span>readmore</span>目录(有关运行脚本的示例,请参见here) )。
./logs
  1. 在本地计算机或Notebook VM上,您将启动from azureml.train.dnn import TensorFlow script_params = {"--log_dir": "./logs"} # If you want the run to go longer, set --max-steps to a higher number. # script_params["--max_steps"] = "5000" tf_estimator = TensorFlow(source_directory=exp_dir, compute_target=attached_dsvm_compute, entry_script='mnist_with_summaries.py', script_params=script_params) run = exp.submit(tf_estimator) 实例,该实例将不断从运行中提取日志并将其写入本地磁盘。它还将启动一个Tensorboard实例,您可以将浏览器指向该实例。
azureml.tensorboard.Tensorboard

如果在本地计算机上完成,则URL将为tb = Tensorboard(run) # If successful, start() returns a string with the URI of the instance. tb.start() (或计算机主机名),在笔记本VM上,URL的格式为http://localhost:6000

以下是有关如何在AzureML中执行运行的图表。 #6和#7是此处的相关要点,说明了Tensorboard日志如何从计算目标传播到运行实际Tensorboard的机器。在这种情况下就是“我的电脑”,但也可能是NotebookVM。 enter image description here