Inception v3,训练测试和验证在retrain.py中拆分

时间:2019-10-15 12:45:24

标签: python-3.x tensorflow

我一直在使用tensorflow的retrain.py脚本用于初始v3模型来构建多标签分类模型,我在脚本中发现了一些我不完全了解的代码,我在寻求帮助以澄清它的作用:

 for file_name in file_list:
        base_name = os.path.basename(file_name)
        # We want to ignore anything after '_nohash_' in the file name when
        # deciding which set to put an image in, the data set creator has a way of
        # grouping photos that are close variations of each other. For example
        # this is used in the plant disease data set to group multiple pictures of
        # the same leaf.
        hash_name = re.sub(r'_nohash_.*$', '', file_name)
        # This looks a bit magical, but we need to decide whether this file should
        # go into the training, testing, or validation sets, and we want to keep
        # existing files in the same set even if more files are subsequently
        # added.
        # To do that, we need a stable way of deciding based on just the file name
        # itself, so we do a hash of that and then use that to generate a
        # probability value that we use to assign it.
        hash_name_hashed = hashlib.sha1(compat.as_bytes(hash_name)).hexdigest()
        percentage_hash = ((int(hash_name_hashed, 16) %
                            (MAX_NUM_IMAGES_PER_CLASS + 1)) *
                           (100.0 / MAX_NUM_IMAGES_PER_CLASS))
        if percentage_hash < validation_percentage:
            validation_images.append(base_name)
        elif percentage_hash < (testing_percentage + validation_percentage):
            testing_images.append(base_name)
        else:
            training_images.append(base_name)

有没有一种方法可以命名图像,以确定哪些图像进入训练集,验证集或测试集?如果有人拥有此处提到的植物病害数据集的链接也将有所帮助,谢谢!

0 个答案:

没有答案