我正在使用srun
运行我的程序,但是,它无法打印输出。
me@home:~$ srun -p K80q --gres=gpu:1 -N 1 python3 main.py
2019-05-15 19:56:43.305156: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-05-15 19:56:43.543516: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:85:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2019-05-15 19:56:43.543567: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2019-05-15 19:56:43.900189: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-15 19:56:43.900248: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
2019-05-15 19:56:43.900257: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
2019-05-15 19:56:43.900619: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10761 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:85:00.0, compute capability: 3.7)
我只得到上面的输出,它不能打印我期望的信息。我该如何解决?
顺便说一句,只需定义一个测试代码
import tensorflow
if __name__ == '__main__':
for i in range(10):
print('Hello')
它可以打印Hello
10次。
更新:
20分钟后,它会输出一些我期望的信息。如何立即输出?
答案 0 :(得分:1)
尝试使用-u
的{{1}}选项:
-u,--unbuffered 默认情况下,slurmstepd和用户启动的应用程序之间的连接是通过管道进行的。编写的stdio输出 该应用程序是 由glibc缓冲,直到刷新或输出设置为未缓冲为止。参见setbuf(3)。如果指定此选项, 任务执行 带有伪终端,以便应用程序输出不受缓冲。