我尝试用图像和分割蒙版数据集训练deeplabv3 +。但是,官方github项目(https://github.com/tensorflow/models/tree/master/research/deeplab)仅提供培训VOC,Cityscapes和ADE20K数据集。因此,尝试遵循VOC数据集映射来构建我的数据集。我的目标是从视频中细分桌子。我从视频中获得了80个连续帧。图像形状为 1920 * 1080 。我使用“ Labelme”为这80张图像创建了表格蒙版。视频帧中的表相似。我将它们与60个分开训练,20个用于验证。结构类似于VOC数据集图:
JPEGImages:用于RGB彩色图像,
分段:对于包含火车文件名,val文件名和trainval文件名的txt文件,
SegmentationClass:用于带有颜色映射的蒙版图像。
然后,我修改“ download_and_convert_voc2012.sh”以删除颜色图并创建tfrecord文件。然后,我修改“ data_gernerator.py”以更改VOC的数据信息。换句话说,我更改了原始代码:
_PASCAL_VOC_SEG_INFORMATION = DatasetDescriptor(
splits_to_sizes={
'train': 1464,
'train_aug': 10582,
'trainval': 2913,
'val': 1449,
},
num_classes=21,
ignore_label=255,
)
收件人
_PASCAL_VOC_SEG_INFORMATION = DatasetDescriptor(
splits_to_sizes={
'train': 60,
'train_aug': 432,
'trainval': 80,
'val': 20,
},
num_classes=2,
ignore_label=255,
)
下一步是使用下面的通用知识进行培训:
python3 deeplab/train.py \
--logtostderr \
--training_number_of_steps=10000 \
--learning_rate_decay_step=500 \
--train_split="train" \
--base_learning_rate=0.0001 \
--adan_learning_rate=0.0001 \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--train_crop_size="513,513" \
--train_batch_size=1 \
--dataset="pascal_voc_seg" \
--tf_initial_checkpoint="/home/.../models/check_point/xception/model.ckpt.index" \
--train_logdir="/home/.../models/train_log" \
--dataset_dir="/home/.../models/research/deeplab/datasets/pascal_voc_seg/tfrecord"
最初的检查点是“ xception65_coco_voc_trainval”,tfrecord文件的数据集是从我的80个图像数据中生成的。但是,训练过程在10000个步骤中上下损失约4.8〜5.5。我尝试降低学习速度,但损失没有减少。我尝试将“ --train_crop_size”更改为“ 1920,1080”,但计算机故障。我应该将图像尺寸调整为“ 500 * 500”左右吗?但是,我没有刷新我的图像。经过训练10000步。让我展示最后100个步骤的过程:
I0201 21:24:03.829240 140124578015040 learning.py:507] global step 9900: loss = 5.0399 (3.194 sec/step)
INFO:tensorflow:global step 9910: loss = 4.8355 (3.152 sec/step)
I0201 21:24:35.473860 140124578015040 learning.py:507] global step 9910: loss = 4.8355 (3.152 sec/step)
INFO:tensorflow:global step 9920: loss = 4.8337 (3.179 sec/step)
I0201 21:25:07.198329 140124578015040 learning.py:507] global step 9920: loss = 4.8337 (3.179 sec/step)
INFO:tensorflow:global step 9930: loss = 4.8411 (3.156 sec/step)
I0201 21:25:38.925366 140124578015040 learning.py:507] global step 9930: loss = 4.8411 (3.156 sec/step)
INFO:tensorflow:global step 9940: loss = 4.8540 (3.188 sec/step)
I0201 21:26:10.489563 140124578015040 learning.py:507] global step 9940: loss = 4.8540 (3.188 sec/step)
INFO:tensorflow:global step 9950: loss = 4.8426 (3.221 sec/step)
I0201 21:26:42.325965 140124578015040 learning.py:507] global step 9950: loss = 4.8426 (3.221 sec/step)
INFO:tensorflow:global step 9960: loss = 4.8415 (3.130 sec/step)
I0201 21:27:14.000005 140124578015040 learning.py:507] global step 9960: loss = 4.8415 (3.130 sec/step)
INFO:tensorflow:global step 9970: loss = 4.9121 (3.163 sec/step)
I0201 21:27:45.751608 140124578015040 learning.py:507] global step 9970: loss = 4.9121 (3.163 sec/step)
INFO:tensorflow:global step 9980: loss = 4.8351 (3.145 sec/step)
I0201 21:28:17.505195 140124578015040 learning.py:507] global step 9980: loss = 4.8351 (3.145 sec/step)
INFO:tensorflow:global step 9990: loss = 4.8401 (3.153 sec/step)
I0201 21:28:49.131623 140124578015040 learning.py:507] global step 9990: loss = 4.8401 (3.153 sec/step)
INFO:tensorflow:Recording summary at step 9990.
I0201 21:28:50.591169 140119415715584 supervisor.py:1050] Recording summary at step 9990.
INFO:tensorflow:global step 10000: loss = 4.8512 (3.149 sec/step)
I0201 21:29:21.474581 140124578015040 learning.py:507] global step 10000: loss = 4.8512 (3.149 sec/step)
INFO:tensorflow:Stopping Training.
I0201 21:29:21.475129 140124578015040 learning.py:777] Stopping Training.
INFO:tensorflow:Finished training! Saving model to disk.
I0201 21:29:21.475337 140124578015040 learning.py:785] Finished training! Saving model to disk.
/home/.../.local/lib/python3.6/site-packages/tensorflow/python/summary/writer/writer.py:386: UserWarning: Attempting to use a closed FileWriter. The operation will be a noop unless the FileWriter is explicitly reopened.
warnings.warn("Attempting to use a closed FileWriter. "
训练后,我使用下面的通用方法导出模型:
python3 deeplab/export_model.py \
--logtostderr \
--checkpoint_path="/home/.../models/train_log/model.ckpt-10000" \
--export_path="/home/.../models/mode_export/table_seg_graph.pb" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--num_classes=2 \
--crop_size=513 \
--crop_size=513 \
--inference_scales=1.0
我得到了权重模型“ table_seg_graph.pb”。之后,我尝试修改“ deeplab_demo.ipynb”以使用此模型文件。我删除有关解压缩tar文件的代码,并在下面添加代码以加载pb文件:
class DeepLabModel(object):
def __init__(self, tarball_path):
...
with tf.gfile.GFile(tarball_path, "rb") as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
...
...
我还将模型下载部分更改为模型路径:
download_path = '/home.../models/mode_export/table_seg_graph.pb'
MODEL = DeepLabModel(download_path)
然后,我加载表格图像以测试模型:
def run_visualization(url):
try:
original_im = Image.open('/home/.../models/research/deeplab/datasets/pascal_voc_seg/Tabledata/JPEGImages/v00019.jpg')
except IOErr:
...
但是,结果什么都没有。它没有检测到任何东西。 面具遮罩全黑。 由于该视频具有复制权,因此无法向您显示图像结果。
我只是想我应该将1920 * 1080的大小调整为513 * 513。否则,我不知道。谁能告诉我应该选择哪种参数?
我应该选择什么“ atrous_rates”?
我应该给什么“学习率”?
我应该给“ crop_size”什么?