Question

我正在努力将我的（杂乱）代码从tensorflow核心传递到<!DOCTYPE html> <html> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1"> <title>Calculator</title> <link rel="stylesheet" href="style.css"> </head> <body> <h1>Calculator</h1> <div id="calculator"> <div id="screen"> <h1 id="output">0</h1> </div> <div class="buttonContainer"> <button class="button" value="7"> <h1 class = "number">7</h1> </button> <button class="button" value="8"> <h1 class = "number">8</h1> </button> <button class="button" value="9"> <h1 class = "number">9</h1> </button> <button class="button" value="+"> <h1 class = "number">+</h1> </button> <button class="button" value="4"> <h1 class = "number">4</h1> </button> <button class="button" value="5"> <h1 class = "number">5</h1> </button> <button class="button" value="6"> <h1 class = "number">6</h1> </button> <button class="button" value="-"> <h1 class = "operator">-</h1> </button> <button class="button" value="1"> <h1 class = "number">1</h1> </button> <button class="button" value="2"> <h1 class = "number">2</h1> </button> <button class="button" value="3"> <h1 class = "number">3</h1> </button> <button class="button" value="*"> <h1 class = "operator">*</h1> </button> <button class="button" value="."> <h1 class = "operator">.</h1> </button> <button class="button" value="0"> <h1 class = "number">0</h1> </button> <button class="button" value="="> <h1 class = "operator">=</h1> </button> <button class="button" value="/"> <h1 class = "operator">/</h1> </button> </div> </div> <script src="script.js"></script> </body> </html>范例，特别是使用Estimator - Experiments。但实际上我在向神经网络提供数据时遇到了问题。

我想要实现的目标实际上与TensorFlow和learn_runner.run的所有示例非常接近，例如https://github.com/GoogleCloudPlatform/cloudml-samples/blob/master/census/customestimator/trainer/model.py#L297，虽然我不是从磁盘上的文件加载数据，而是加载Web服务。

根据我的理解（并查看tf.TextLineReader的代码），tensorflow.python.estimator._train_model()仅被调用一次而不是每次迭代。我可以轻松加载我的所有数据，然后执行以下操作：

input_fn

但这是不可持续的，因为我的数据不适合记忆。我正在尝试做类似的事情：

def input_fn():
    data = # all data in memory
    batch = tf.train.input_producer(tf.constant(data))
    return batch.dequeue_many(batch_size)

我知道怎么用“纯”tf来做，例如How to prefetch data using a custom python function in tensorflow或Tensorflow: custom data load + asynchronous computation但我发现很难将其转换为1. load first piece of data (say N lines) 2. consume it by batches in a queue just like the input_fn above 2'. feed this queue asynchronously with new data when it's almost empty范例，因为我无权访问会话以自行加载内容，也无法将图表转换为在里面附加操作。

修改

我设法使用Experiment，例如：

tf.py_func()

我工作得很好，虽然它有点慢（正如预期的那样，从C ++执行到Python的方式会引入大约50％的延迟）。我试图通过在读取器中异步读取的Python数据放入特定的TensorFlow队列来解决这个问题，这样就可以在不将数据从Python传递到C ++的情况下完成加载（就像上面的两个链接一样）。 p>

Answer 1

我有similar issue，我使用SessionRunHook找到了解决方法。此挂钩（还有其他挂钩）允许您在创建会话后立即初始化操作。

Answer 2

tf.data.Dataset.from_generator是一个数据集，它调用您的函数来一次生成一个示例数据。这使您可以随意编程所需的数据生成，例如分批加载，然后在每次调用时从批处理中生成一个示例。 This other question有一个例子。

TensorFlow实验：如何避免使用input_fn加载内存中的所有数据？

2 个答案: