Question

我正在努力使用Django上的连接池进行多线程处理。

我知道python线程有GIL问题，但我认为如果大部分工作都是DB I / O，python线程就足以提高性能。

首先，我试图实现一个小代码来证明我的想法。

简单解释一下，代码使用threadPool.apply_async()和CONN_MAX_AGE中settings.py设置的数据库连接池。

使用代码，我重复控制工作线程的线程数。

from multiprocessing            import pool
from threadPoolTestWithDB_IO    import models
from django.db                  import transaction
import django
import datetime
import logging
import g2sType


def addEgm(pre, id_):
    """
    @summary: This function only inserts a bundle of records tied by a foreign key 
    """
    try:
        with transaction.atomic():

            egmId = pre + "_" + str(id_)
            egm = models.G2sEgm(egmId=egmId, egmLocation="localhost")
            egm.save()

            device = models.Device(egm=egm,
                          deviceId=1,
                          deviceClass=g2sType.t_deviceClass.G2S_eventHandler,
                          deviceActive=True)
            device.save()

            models.EventHandlerProfile(device=device, queueBehavior="a").save()
            models.EventHandlerStatus(device=device).save()

            for i2 in range(1, 200):
                models.EventReportData(device=device,
                                       deviceClass=g2sType.t_deviceClass.G2S_communications,
                                       deviceId=1,
                                       eventCode="TEST",
                                       eventText="",
                                       eventId=i2,
                                       transactionId=0
                                       ).save()

            print "Done %d" % id_


    except Exception as e:
        logging.root.exception(e)


if __name__ == "__main__":

    django.setup()
    logging.basicConfig()

    print "Start test"

    tPool = pool.ThreadPool(processes=1)    #Set the number of processes

    s = datetime.datetime.now()
    for i in range(100):                    #Set the number of record bundles
        tPool.apply_async(func=addEgm, args=("a", i))

    print "Wait worker processes"
    tPool.close()                           
    tPool.join()

    e = datetime.datetime.now()
    print "End test"

    print "Time Measurement : %s" % (e-s,)

    models.G2sEgm.objects.all().delete()    #remove all records inserted while the test
--------------------------
# settings.py


DATABASES = {
             'default': {
                         'ENGINE': 'django.db.backends.oracle',
                         'NAME': 'orcl',
                         'USER': 'test',
                         'PASSWORD': '1123',
                         'HOST': '192.168.0.90',
                         'PORT': '1521',
                         'CONN_MAX_AGE': 100,
                         'OPTIONS': {'threaded': True}
                         }
             }

然而，结果出来了，因为它们在1个线程工作者和多线程工作之间没有任何大的区别。

例如，30.6 sec需要10个线程，30.4 sec需要1个线程。

我出了什么问题？

Answer 1

您在数据库级别遇到问题。您可以通过执行此查询来证明它：

<?xml version="1.0" encoding="iso-8859-1" standalone="no"?>
<kl_plan>
//this is the part I want to import:
   <pl>
      <pl_tag>day1</pl_tag>
      <pl_stunde>lesson1</pl_stunde>
      <pl_un>8</pl_un>
      <pl_fach>subject1</pl_fach>
      <pl_fachori>subject1</pl_fachori>
      <pl_klasse>class1</pl_klasse>
      <pl_lehrer legeaendert="legeaendert">Hr</pl_lehrer>  //"geaendert" is the keyword
      <pl_lehrerori>teacher1</pl_lehrerori>
      <pl_raum>room1</pl_raum>
   </pl>

//this is the information (with the normal format) I don't want to import:
   <pl>
      <pl_tag>day</pl_tag>
      <pl_stunde>lesson</pl_stunde>
      <pl_un>36</pl_un>
      <pl_fach>subject</pl_fach>
      <pl_fachori>subject</pl_fachori>
      <pl_klasse>class</pl_klasse>
      <pl_lehrer>teacher</pl_lehrer>
      <pl_lehrerori>teacher</pl_lehrerori>
      <pl_raum>room</pl_raum>
    </pl>
 </kl_plan>

或者Python中有线程被阻止。（可能在DB驱动程序级别）。将gdb附加到python进程，然后执行select /* +rule */ s1.username || '@' || s1.machine || ' ( SID=' || s1.sid || ' ' || s1.program || ' ) is blocking ' || s2.username || '@' || s2.machine || ' ( SID=' || s2.sid || ' ' || s2.program || ' ) ' AS blocking_status from v$lock l1, v$session s1, v$lock l2, v$session s2 where s1.sid=l1.sid and s2.sid=l2.sid and l1.BLOCK=1 and l2.request > 0 and l1.id1 = l2.id1 and l2.id2 = l2.id2 ;。

你会看到。

我的django连接池的多线程代码没有任何改进

1 个答案: