我将tensorboard与pytorch1.1一起使用以记录损失值。
我在每个for循环主体中使用writer.add_scalar("loss", loss.item(), global_step)
。
但是,在进行训练时,绘图图不会更新。
每次我想查看最新的损失时,都必须重新启动tensorboard服务器。
代码在这里
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
from torch.utils.tensorboard import SummaryWriter
from torchvision import datasets, transforms
# Writer will output to ./runs/ directory by default
writer = SummaryWriter()
transform = transforms.Compose(
[transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))]
)
trainset = datasets.MNIST("mnist_train", train=True, download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
model = torchvision.models.resnet50(False)
# Have ResNet model take in grayscale rather than RGB
model.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
model.fc = nn.Linear(2048, 10, True)
criterion = nn.CrossEntropyLoss()
epochs = 100
opt = torch.optim.Adam(model.parameters())
niter = 0
for epoch in range(epochs):
for step, (x, y) in enumerate(trainloader):
yp = model(x)
loss = criterion(yp, y)
opt.zero_grad()
loss.backward()
opt.step()
writer.add_scalar("loss", loss.item(), niter)
niter += 1
print(loss.item())
grid = torchvision.utils.make_grid(images)
writer.add_image("images", grid, 0)
writer.add_graph(model, images)
writer.close()
培训仍在继续,全局步骤已达到3594。但是,张量板仍显示在1900年左右。
答案 0 :(得分:5)
对于那些一次运行具有多个事件日志文件的人,您还需要使用--reload_multifile True
答案 1 :(得分:0)
在日志记录方面内部进行了缓存。要查看是否存在问题,请使用
创建您的SummaryWriterconst sequelize = new Sequelize(process.env.DB_NAME, process.env.DB_USER, process.env.DB_PASS, {
dialect: 'mysql',
pool: {
max: 5,
min: 0,
acquire: 30000,
idle: 10000
},
dialectOptions: {
socketPath: util.format('/cloudsql/%s', process.env.INSTANCE_CONNECTION_NAME),
},
logging: false
});
,看看事情是否立即更新。如果是这样,请根据您的情况随意调整flush_secs(默认为120)。不过,根据您的描述,这可能来自TensorBoard可视化方面。如果是这样,则它必须与轮询间隔有关。
安装TensorFlow(强制TensorBoard使用其他文件系统后端)是否会为您更改此行为?