Crawler无法在高负载下连接到MySQL数据库

时间:2014-04-23 12:22:44

标签: python mysql web-crawler

我有一个用Python编写的爬虫,在爬行一段时间后无法连接到数据库。有时它可以工作10分钟,之后会出现以下错误日志:

(2003, "Can't connect to MySQL server on 'localhost' (10055)")
(2003, "Can't connect to MySQL server on 'localhost' (10055)")
Traceback (most recent call last):
 File "C:\Users\Admin\Documents\eclipse\workspace\Crawler\src\Crawlers\Zanox.py", line 73, in <module>
c.main()
File "C:\Users\Admin\Documents\eclipse\workspace\Crawler\src\Crawlers\Zanox.py", line 38, in main
self.getInfo()
File "C:\Users\Admin\Documents\eclipse\workspace\Crawler\src\Crawlers\Zanox.py", line 69, in getInfo
comparator.main()
File "C:\Users\Admin\Documents\eclipse\workspace\Crawler\src\CrawlerHelpScripts\Comparator.py", line 23, in main
self.compare()
File "C:\Users\Admin\Documents\eclipse\workspace\Crawler\src\CrawlerHelpScripts\Comparator.py", line 36, in compare
deliveryInfo = self.db.getDeliveryInfo() 
File "C:\Users\Admin\Documents\eclipse\workspace\Crawler\src\Database\dell.py", line 29, in getDeliveryInfo
result = self.db.select(com, vals)
File "C:\Users\Admin\Documents\eclipse\workspace\Crawler\src\Database\Database.py", line 24, in select
self.con.close()
_mysql_exceptions.ProgrammingError: closing a closed connection

所以在某一点上,它无法连接到数据库,在localhost上运行,然后产生ProgrammingError。处理此异常不是问题,因为它会继续运行,但也会继续产生无法连接错误。

以下是我在数据库中插入/选择的代码:

def select(self, com, vals):
    try:
        self.con = mdb.connect(self.host, self.user, self.password, self.database)
        cur = self.con.cursor()
        cur.execute(com,vals)

        ver = cur.fetchall()
        return ver
    except  mdb.Error as e:
        print e
    finally:
        if self.con:
            self.con.close()

def insert(self, com, vals):
    try:
        self.con = mdb.connect(self.host, self.user, self.password, self.database)
        cur = self.con.cursor()
        cur.execute(com, vals)
        self.con.commit()
    except mdb.Error as e:
        print e
    finally:
        if self.con:
            self.con.close()

此外,爬虫不是多线程的。任何想法为什么它一直失去联系?

编辑:似乎爬虫工作正常,直到它在数据库中插入了大约15.000条记录。如果该表包含大约15.000条记录,则爬网程序会更快地生成错误。

0 个答案:

没有答案