为什么TensorBoard摘要未更新?

时间:2019-05-04 08:21:50

标签: tensorflow pytorch tensorboard

我将tensorboard与pytorch1.1一起使用以记录损失值。

我在每个for循环主体中使用writer.add_scalar("loss", loss.item(), global_step)

但是,在进行训练时,绘图图不会更新。

每次我想查看最新的损失时,都必须重新启动tensorboard服务器。

代码在这里

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
from torch.utils.tensorboard import SummaryWriter
from torchvision import datasets, transforms

# Writer will output to ./runs/ directory by default
writer = SummaryWriter()

transform = transforms.Compose(
    [transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))]
)
trainset = datasets.MNIST("mnist_train", train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
model = torchvision.models.resnet50(False)
# Have ResNet model take in grayscale rather than RGB
model.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
model.fc = nn.Linear(2048, 10, True)

criterion = nn.CrossEntropyLoss()

epochs = 100

opt = torch.optim.Adam(model.parameters())

niter = 0

for epoch in range(epochs):
    for step, (x, y) in enumerate(trainloader):
        yp = model(x)
        loss = criterion(yp, y)
        opt.zero_grad()
        loss.backward()
        opt.step()
        writer.add_scalar("loss", loss.item(), niter)
        niter += 1
        print(loss.item())

grid = torchvision.utils.make_grid(images)
writer.add_image("images", grid, 0)
writer.add_graph(model, images)
writer.close()

培训仍在继续,全局步骤已达到3594。但是,张量板仍显示在1900年左右。

enter image description here

2 个答案:

答案 0 :(得分:5)

对于那些一次运行具有多个事件日志文件的人,您还需要使用--reload_multifile True

来启动tensorboard。

答案 1 :(得分:0)

在日志记录方面内部进行了缓存。要查看是否存在问题,请使用

创建您的SummaryWriter
const sequelize = new Sequelize(process.env.DB_NAME, process.env.DB_USER, process.env.DB_PASS, {
      dialect: 'mysql',
      pool: {
          max: 5,
          min: 0,
          acquire: 30000,
          idle: 10000
      },
      dialectOptions: {
          socketPath: util.format('/cloudsql/%s', process.env.INSTANCE_CONNECTION_NAME),
      },
      logging: false
    });

,看看事情是否立即更新。如果是这样,请根据您的情况随意调整flush_secs(默认为120)。不过,根据您的描述,这可能来自TensorBoard可视化方面。如果是这样,则它必须与轮询间隔有关。

安装TensorFlow(强制TensorBoard使用其他文件系统后端)是否会为您更改此行为?