Question

我使用具有无限循环的python创建了一个简单的爬虫，因此无法停止。随机延迟17~30，此爬虫爬行同一页并找到定期更新并存储到Mysql的'href'链接。我用过Ubuntu服务器。因为我使用了Linux命令

$ nohup python crawer.py &

所以这个爬虫在Ubuntu服务器后台运行。它我已经跑了大约4个小时。但突然爬行停了下来。第二天我再试一次。它运作良好！问题是什么？这是关于网页的块吗？或nohup命令有限制时间???? 非常感谢。

Answer 1

不，nohup将按照其设计目标行事。那就是：

 The nohup utility invokes utility with its arguments and at this time
 sets the signal SIGHUP to be ignored.  If the standard output is a termi-
 nal, the standard output is appended to the file nohup.out in the current
 directory.  If standard error is a terminal, it is directed to the same
 place as the standard output.

 Some shells may provide a builtin nohup command which is similar or iden-
 tical to this utility.  Consult the builtin(1) manual page.

Bash（和其他shell ）&将对任务进行后台处理。 nohup with＆amp;即使你终止你的tty / pty会话，也可以有效地让进程在后台运行。

我认为问题是你的Python程序崩溃了。你应该花点时间在一些日志记录中找出答案。 e.g：

nohup my_app.py &> myapp.log &

在ubuntu中的后台运行时，爬虫停止了

1 个答案: