Question

我们在Ubuntu 12.04 LTS服务器上遇到crontabs问题。

目标： 保存.json输出的自动python脚本

目的： 此.json需要在MongoDB数据库中导入（也是自动化的）

设置： Ubuntu-server，12.04 LTS

注意： webcrawler代码包含：＃！/ usr / bin / python第一行

让我们从头开始吧。我们需要做的就是使用crontab自动化python脚本。我们有一个用Python脚本编写的webcrawler。输出将保存在.json文件中。这已在脚本中确定/编码。我们需要此爬虫一天运行多次，以便将.json文件用于其他目的。

这是我们到目前为止所尝试的内容：

我们开始使用：

使.py文件可执行

$ sudo chmod -x ./crawler.py

然后我们使用：

制作了一个cronjob

$ sudo crontab -e
20 16 2 1 5 /home/user/crawler.py

现在，我们检查了输出是否正确。我们需要服务器上的.json文件？我们用过：

$ sudo grep CRON /var/log/syslog

我们的输出：

Jan  2 15:17:01 ubuntu-vm CRON[20062]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan  2 16:17:01 ubuntu-vm CRON[20927]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan  2 16:20:01 ubuntu-vm CRON[20933]: (root) CMD (python /home/user/crawler.py)
Jan  2 16:20:01 ubuntu-vm CRON[20932]: (CRON) info (No MTA installed, discarding output)

丢弃输出？没有！我们需要那个输出！我们使用谷歌和其他stackoverflow-posts来检查＆＃34;没有安装MTA等的解决方案＆＃34;。所以我们提出了几个链接，我们做了两件事。安装postfix（MTA）并编辑crontab。

*没有配置*

$ sudo apt-get install postfix
35 16 2 1 5 /usr/bin/python /home/user/crawler.py

我们的输出：

Jan  2 16:35:01 ubuntu-vm CRON[22459]: (root) CMD (/usr/bin/python /home/user/crawler.py)
Jan  2 16:35:01 ubuntu-vm CRON[22458]: (root) MAIL (mailed 1 byte of output; but got status 0x004b, #012)

最后，我们通过在crontab中添加/ dev / null 2＆gt; $ 1来尝试再做一件事，以便跳过＆＃34; MTA-thingy＆＃34;。

$ sudo crontab -e    
52 16 2 1 5 /usr/bin/python /home/user/crawler.py /dev/null 2>$1

Jan  2 16:52:01 ubuntu-vm CRON[22494]: (root) CMD (/usr/bin/python /home/user/crawler.py /dev/null 2>&1)
Jan  2 16:52:01 ubuntu-vm CRON[22493]: (root) MAIL (mailed 1 byte of output; but got status 0x004b, #012)

那么，到底？我们无法弄清楚我们做错了什么。我们只需要一个python脚本自动生成并生成一个.json文件作为输出。我们的问题是，有人可以看一下这篇文章并帮助我们吗？我们非常感谢！：）

修改1 @Digisec @Alex Martelli

我们尝试/ dev / null 2＆gt;＆amp; 1的原因是，在crontab之后我们在syslog上遇到了有关MTA邮件传输代理的错误。所以我们提出这个是为了跳过这个错误。但显然它不会有所作为。事实上，它在这个cronjob中根本没有任何功能。我们不知道/ dev / null实际上是做什么的。我们只需要一个crawler.py就可以按时间单位运行。

您可能会注意到json格式完全不同。实际上，此抓取工具是为data.txt和 NOT data.json制作的。但有些人告诉我们这不会有所作为。但事实上它应该是.txt。但我们需要MongoDB中的数据。 MongoDB不能与txt一起使用。

感谢您的评论！

def spider_Products():
    print('Loading products...')
    dataset = open('data.json', 'w')
    for page in bbvProductpages:
        source_code = urllib.request.urlopen(page)
        main_data = BeautifulSoup(source_code.read())
        link = main_data.findAll('div', {'class':'listRow'})

编辑2 @Digisec

ubuntu-0864947@ubuntu-vm:~$ sudo bash ./crawler.py
./crawler.py: line 2: __author__: command not found
./crawler.py: line 3: import: command not found
./crawler.py: line 4: import: command not found
from: can't read /var/mail/bs4
./crawler.py: line 8: url: command not found
./crawler.py: line 10: bbvSubmenus: command not found
./crawler.py: line 11: bbvProductpages: command not found
./crawler.py: line 12: bbvSublevels: command not found
./crawler.py: line 13: /html/product/listing.html?navId=2436&bgid=8148&tk=7&lk=9309,: No such file or directory
./crawler.py: line 14: /html/highlights/page.html?hgid=658&tgid=972&tk=7&lk=9302,: No such file or directory
./crawler.py: line 15: /html/product/listing.html?navId=1216&tk=7&lk=9529,: No such file or directory
./crawler.py: line 16: /html/product/listing.html?navId=1218&tk=7&lk=9530,: No such file or directory
./crawler.py: line 17: /html/product/listing.html?navId=1222&tk=7&lk=9531: No such file or directory
./crawler.py: line 18: ]: command not found
./crawler.py: line 20: syntax error near unexpected token `('
./crawler.py: line 20: `def spider_Mainpage():'

路径：/home/user/crawler.py

/ dev / null的原因： http://ubuntu.aspcode.net/view/635400140124705175345096/cron-info-no-mta-installed-discarding-output-error-in-the-syslog

编辑3 @Digisec

我将第一行改为

#!/usr/bin/env python

就像你想要我做的那样。执行.py文件：

user@ubuntu-vm:~$ python ./crawler.py
File "./crawler.py", line 73
SyntaxError: Non-ASCII character '\xe2' in file ./crawler.py on line 73, but no 
encoding declared; see http://www.python.org/peps/pep-0263.html for details
user@ubuntu-vm:~$

你能看到我们遗失的东西吗？

Crontab无法执行Python脚本

0 个答案: