Question

我正在开发一个项目，该项目使用ConvNet扫描整个幻灯片图像（WSI）文件。但是，WSI真的很大;例如，200,000x100,000。因此，我必须一次只读取一个小补丁（例如256x256）和另一个名为openslide的库。

然而，由于python中的全局解释器锁（GIL），一切都按顺序工作，这使得程序真的很慢。

理想情况下，我想在张量流中implement a new operation。它将输入作为WSI文件名并连续从WSI读取补丁并将它们输出到张量流图。请注意，此操作应与张量流图中定义的其他操作同时发生。那可能吗？

对我来说，最佳解决方案如下：c ++线程将补丁连续排入tensorlow队列（例如，tf.FIFOQueue），在主程序（python接口）中，图形从队列中取出一个或多个补丁会议;即，c ++接口应该有一个额外的队列参数。但是，张量流c ++接口似乎相当有限。还有其他选择吗？

非常感谢！

Answer 1

我想知道您现在是否关心这个问题，但是我为TensorFlow实现了小补丁阅读器。它使用tf.FIFOQueue，线程和openslide-python。 https://github.com/OtaYuji/openslider-tfpy

通过使用此库，您可以将小补丁图像输入到TensorFlow模型中。例如，假设您已经训练了机器学习模型来检测乳腺癌的有丝分裂细胞，并希望将训练后的模型应用于新的病理切片的整个区域，那么以下代码就是解决方案。


import tensorflow as tf

from openslidertfpy import MicroImageReader


openslide_read_region_params = [
    ((0, 0), 0),  # location and level for openslide's read_region func
    ((128, 0), 0),
    ((256, 0), 1),
    ...  # And so on...
]
image_width, image_height = 128, 128

with tf.Graph().as_default():
    coordinator = tf.train.Coordinator()
    runner = MicroImageReader(
        "sample.svs", coordinator, image_width, image_height
    )
    images, locations_batch, levels_batch = runner.get_inputs()
    # Define images, openslide's read_region function parameters

    results = some_op(images)  # Place your ML model here

    with tf.Session() as sess:

        # Some function to initialize values should be placed here

        # Start reading pathology images
        runner.start_thread(sess)
        while not coord.should_stop():
            actual_results, locs, levs = sess.run(
                [results, locations_batch, levels_batch]
            )

如何实现从张量流中的大图像读取图像块的新操作

1 个答案: