Python长时间运行的过程变慢了

时间:2015-02-23 09:04:08

标签: python python-2.7

我遇到了性能问题。

我的用于检查NFS上现有文件的脚本在几天的工作后显着减慢。

例如,在开始时,1000个文件需要7秒,但现在超过40秒。我现在不应该停止它,不能通过调试器连接

我需要检查超过2400万条记录

    absent_files_detailed.addHandler(FileHandler("logs/absent-files-detailed.log"))
absent_files.addHandler(FileHandler("logs/absent-files.log"))

sql_logger.addHandler(FileHandler("logs/database-sql.log"))

# database configurations
datastorage_file_database = database_connector.DataConfig("host", "database", "user", "password")

LOG_MESSAGE_FREQUENCY = 1000
SELECT_BATCH_SIZE = 1000


def init():
    pass


def check():
    connection_factory = database_connector.Connection()
    database = connection_factory.open(datastorage_file_database)

    number_of_existing_items = 0
    index = None
    for index, record in enumerate(database_records(database, SELECT_BATCH_SIZE)):
        if int(index) % LOG_MESSAGE_FREQUENCY == 0:
            print cur_time() + " %s files processed" % index
        if not on_disk(record["path"]):
            number_of_existing_items += 1
            absent_files_detailed.warning(cur_time() + "item '%s' -> '%s' not exist on disk!",
                                          record["id"], record["path"])
            absent_files.warning(record["id"])
    print "END ->> %s items processed" % (index + 1)
    print str(number_of_existing_items) + " files not exist, but recorded in base."


pass


def database_records(database, fetch_size=1000):
    page = 0
    while True:
        # Every time  will be created new cursor, and old will be closed
        cursor = database.cursor()

        select_query = "SELECT file_absolute_path, id FROM item LIMIT " + str(SELECT_BATCH_SIZE) + " OFFSET " + str(
            page * SELECT_BATCH_SIZE)
        sql_logger.warning(cur_time() + select_query)

        cursor.execute(select_query)
        # increase page number
        page += 1
        results = cursor.fetchmany(fetch_size)
        if not results:
            cursor.close()
            break
        else:
            for record in results:
                yield dict(path=record[0], id=record[1])
            cursor.close()
    pass


def cur_time():
    return str(time.asctime()) + ": "

我每次都为数据库重新创建游标,因为遇到了“数据库消失”的问题。

UPD 谢谢你们的回复。 没有内存问题(即top命令)

load averages:  0.18,  0.19,  0.19; up 59+19:32:39 09:21:25
120 processes: 119 sleeping, 1 on cpu
CPU states: 98.9% idle,  0.7% user,  0.5% kernel,  0.0% iowait,  0.0% swap
Kernel: 2356 ctxsw, 5 trap, 2219 intr, 1935 syscall
Memory: 64G phys mem, 33G free mem, 4096M total swap, 4096M free swap
@pbastian一定是我的问题。谢谢,我会检查它

0 个答案:

没有答案