分析Spring / JPA / Mysql / Tomcat应用程序中的Connection Closed Exception

时间:2014-02-11 10:05:24

标签: java spring hibernate tomcat connection-pooling

问题

我最近负责Java Web应用程序,代码已经编写完成。该应用程序接收适度高流量,并在每天上午11点至下午3点之间达到高峰时段。 该应用程序使用Spring,JPA(Hibernate),MYSQL DB。 Spring已配置为使用tomcat jdbc连接池来建立与DB的连接。 (帖子末尾的配置细节)

在过去的几天里,在应用程序的高峰负载时间内,由于tomcat对请求没有响应,应用程序一直在停止运行。它需要多次重启tomcat。

通过tomcat catalina.out日志,我注意到了很多

Caused by: java.sql.SQLException: Connection has already been closed.
    at org.apache.tomcat.jdbc.pool.ProxyConnection.invoke(ProxyConnection.java:117)
    at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:109)
    at org.apache.tomcat.jdbc.pool.DisposableConnectionFacade.invoke(DisposableConnectionFacade.java:80)
    at com.sun.proxy.$Proxy28.prepareStatement(Unknown Source)
    at org.hibernate.jdbc.AbstractBatcher.getPreparedStatement(AbstractBatcher.java:505)
    at org.hibernate.jdbc.AbstractBatcher.getPreparedStatement(AbstractBatcher.java:423)
    at org.hibernate.jdbc.AbstractBatcher.prepareQueryStatement(AbstractBatcher.java:139)
    at org.hibernate.loader.Loader.prepareQueryStatement(Loader.java:1547)
    at org.hibernate.loader.Loader.doQuery(Loader.java:673)
    at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
    at org.hibernate.loader.Loader.loadCollection(Loader.java:1994)
    ... 115 more

这些经常在崩溃前出现。

在这些例外之前更进一步,我注意到在连接关闭异常之前放弃了很多连接。

WARNING: Connection has been abandoned PooledConnection[com.mysql.jdbc.Connection@543c2ab5]:java.lang.Exception
    at org.apache.tomcat.jdbc.pool.ConnectionPool.getThreadDump(ConnectionPool.java:1065)
    at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:782)
    at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:618)
    at org.apache.tomcat.jdbc.pool.ConnectionPool.getConnection(ConnectionPool.java:188)
    at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:128)
    at org.hibernate.ejb.connection.InjectedDataSourceConnectionProvider.getConnection(InjectedDataSourceConnectionProvider.java:47)
    at org.hibernate.jdbc.ConnectionManager.openConnection(ConnectionManager.java:423)
    at org.hibernate.jdbc.ConnectionManager.getConnection(ConnectionManager.java:144)
    at org.hibernate.jdbc.AbstractBatcher.prepareQueryStatement(AbstractBatcher.java:139)

这些似乎经常出现在Connection Closed异常之前。这些似乎是日志即将崩溃的第一个症状。

分析

通过日志,我开始查看是否存在可能导致问题的任何连接池配置/ mysql配置。 通过一些优秀的文章,展示了生产环境池的调整。链接1& 2

通过这些文章,我注意到了:

  1. JHanik的文章(链接1)中的以下行提到了这个

      

    将abandonWhenPercentageFull的值设置为100意味着连接不是>被视为放弃,除非我们达到了maxActive限制。

    我认为这在我的案例中可能很重要,因为我看到许多关系被抛弃了。

  2. 我的max_connections设置与推荐的设置不匹配(在链接2中)

      

    mysql max_connections应该等于max_active + max_idle

  3. 我做了什么

    因此,根据文章的建议,我做了以下两件事:

    1. 将放弃时更改为PerPercentageFull为100
    2. 在我的MYSQL服务器中,max_connections设置为500.将其增加到600    在我的连接池设置中,max_active为200,max_idle为50。    将其更改为max_active = 350,max_idle = 250
    3. 这个没有帮助

      第二天,在高峰时段进行了以下观察:

      1. Tomcat没下来。该应用程序在高峰时段熬夜。 然而,性能变得越来越差,然后应用程序几乎无法使用,即使它并没有真正下降。
      2. 数据库连接池虽然规模有所增加,但已被完全利用,我可以在一个点看到350个与DB有效的连接。
      3. 最后,我的问题:

        很明显,从应用服务器进行数据库连接的方式存在问题。 所以我有两个方向来推进这个分析。

        我的问题是我应该采取哪些措施?

        1。问题不在于连接池设置。代码是造成问题的原因

        代码中可能存在未关闭数据库连接的位置。这导致大量连接被打开。

        代码使用GenericDao,它在每个Dao类中都有扩展。 GenericDao使用Spring的JpaTemplate来获取EntityManager实例,该实例又用于所有数据库操作。我的理解是使用JpaTemplate处理内部关闭数据库连接的细节。

        那么,我究竟应该在哪里寻找可能的连接泄漏?

        2。问题在于连接pool / mysql配置参数。但是,我需要进一步调整优化

        如果是,我应该查看哪些参数?    我应该收集一些数据来确定我的连接池的更合适的值。 (例如,对于max_active,max_idle,max_connections)


        附录:完整的连接池配置

           <bean id="dataSource" class="org.apache.tomcat.jdbc.pool.DataSource" destroy-method="close">
                <property name="driverClassName" value="com.mysql.jdbc.Driver" />
                <property name="url" value="jdbc:mysql://xx.xx.xx.xx" />
                <property name="username" value="xxxx" />
                <property name="password" value="xxxx" />
                <property name="initialSize" value="10" />
                <property name="maxActive" value="350" />
                <property name="maxIdle" value="250" />
                <property name="minIdle" value="90" />
                <property name="timeBetweenEvictionRunsMillis" value="30000" />
                <property name="removeAbandoned" value="true" />
                <property name="removeAbandonedTimeout" value="60" />
                <property name="abandonWhenPercentageFull" value="100" />
                <property name="testOnBorrow" value="true" />
                <property name="validationQuery" value="SELECT 1" />
                <property name="validationInterval" value="30000" />
                <property name="logAbandoned" value="true" />
                <property name="jmxEnabled" value="true" />
            </bean>
        

3 个答案:

答案 0 :(得分:10)

对于OP而言,这已经非常晚了,但未来可能会对其他人有所帮助:

我在长期运行批处理作业的生产环境中遇到类似的问题。问题是如果您的代码需要的连接时间超过属性指定的时间:

name="removeAbandonedTimeout" value="60

您已启用:

<property name="removeAbandoned" value="true" />

然后在60秒后处理过程中断开连接。一种可能的解决方法(对我来说没有用)是启用拦截器:

jdbcInterceptors="ResetAbandonedTimer"

这将为每次发生的读/写重置该连接的放弃计时器。不幸的是,在我的情况下,在读取/写入数据库之前,处理有时仍然需要比超时更长的时间。所以我被迫要么超过超时长度,要么禁用removeAbandonded(我选择了以前的解决方案)。

希望如果他们遇到类似的东西,这可以帮助别人!

答案 1 :(得分:1)

我最近被要求调查为什么生产系统有时会下降。我想分享我的发现,因为它涉及事件的关联,以便将JVM tomcat应用程序与上面概述的JDBC问题一起实际崩溃应用程序。这是使用mysql作为后端,所以可能对这种情况最有用,但如果在另一个平台上遇到问题可能会相同。

只需关闭连接并不意味着应用程序已损坏

这是在grails应用程序下,但将与所有与JVM相关的应用程序相关:

tomcat/context.xml db配置,注意非常小的db池和  removeAbandonedTimeout="10"你们我们想让事情破裂

<Resource
 name="jdbc/TestDB"  auth="Container" type="javax.sql.DataSource"
              driverClassName="com.mysql.jdbc.Driver"
              url="jdbc:mysql://127.0.0.1:3306/test"
              username="XXXX"
              password="XXXX"
              testOnBorrow="true"
              testWhileIdle="true"
              testOnReturn="true"
              factory="org.apache.tomcat.jdbc.pool.DataSourceFactory"
              removeAbandoned="true"
              logAbandoned="true"
              removeAbandonedTimeout="10"
              maxWait="5000"
              initialSize="1"
              maxActive="2"
              maxIdle="2"
              minIdle="2"
              validationQuery="Select 1" />

每分钟运行一次石英作业,而不是我认为第一次尝试就死的应用程序非常重要:

class Test2Job {
    static  triggers = {
               cron name: 'test2', cronExpression: "0 0/1 * * * ?"
        }
        def testerService
        def execute() {
        println "starting job2 ${new Date()}"
        testerService.basicTest3()

    }

}

现在我们的testService带有评论,请关注评论:

def dataSource

  /**
   * When using this method in quartz all the jdbc settings appear to get ignored
   * the job actually completes notice huge sleep times compared to basicTest
   * strange and very different behaviour.
   * If I add Tester t = Tester.get(1L) and then execute below query I will get
   * connection pool closed error
   * @return
   */
  def basicTest2() {
      int i=1
      while (i<21) {
          def sql = new Sql(dataSource)
          def query="""select id as id  from tester t
                  where id=:id"""
          def instanceList = sql.rows(query,[id:i as Long],[timeout:90])
          sleep(11000)
          println "-- working on ${i}"
          def sql1 = new Sql(dataSource)
          sql1.executeUpdate(
                  "update tester t set t.name=? where t.id=?",
                  ['aa '+i.toString()+' aa', i as Long])

          i++
          sleep(11000)
      }
      println "run ${i} completed"
  }


  /**
   * This is described in above oddity
   * so if this method is called instead you will see connection closed issues
   */
  def basicTest3() {
      int i=1
      while (i<21) {
          def t = Tester.get(i)
          println "--->>>> test3 t ${t.id}"

          /**
           * APP CRASHER - This is vital and most important
           * Without this declared lots of closed connections and app is working
           * absolutely fine,
           * The test was originally based on execRun() which returns 6650 records or something
           * This test query is returned in time and does not appear to crash app
           *
           * The moment this method is called and please check what it is currently doing. It is simply
           * running a huge query which go beyond the time out values and as explained in previous emails MYSQL states
           *
           * The app is then non responsive and logs clearly show application is broke 
           */
          execRun2()


          def sql1 = new Sql(dataSource)
          sleep(10000)
          sql1.executeUpdate("update tester t set t.name=? where t.id=?",['aa '+i.toString()+' aa', t.id])
          sleep(10000)
          i++
      }

  }


  def execRun2() {
      def query="""select new map (t as tester) from Tester t left join t.children c
left join t.children c
                  left join c.childrena childrena
                  left join childrena.childrenb childrenb
                  left join childrenb.childrenc childrenc , Tester t2 left join t2.children c2 left join t2.children c2
                  left join c2.childrena children2a
                  left join children2a.childrenb children2b
                  left join children2b.childrenc children2c
             where ((c.name like (:name) or
                  childrena.name like (:name) or
                  childrenb.name like (:name) or (childrenc is null or childrenc.name like (:name))) or
                  (
                  c2.name like (:name) or
                  children2a.name like (:name) or
                  children2b.name like (:name) or (children2c is null or children2c.name like (:name))
      ))

          """
      //println "query $query"
      def results = Tester.executeQuery(query,[name:'aa'+'%'],[timeout:90])
      println "Records: ${results.size()}"

      return results
  }


  /**
   * This is no different to basicTest2 and yet
   * this throws a connection closed error and notice it is 20 not 20000
   * quite instantly a connection closed error is thrown when a .get is used vs
   * sql = new Sql(..) is a manuall connection
   *
   */
  def basicTest() {
      int i=1
      while (i<21) {
          def t = Tester.get(i)
          println "--- t ${t.id}"
          sleep(20)
          //println "publishing event ${event}"
          //new Thread({
          //    def event=new PurchaseOrderPaymentEvent(t,t.id)
          //    publishEvent(event)
          //} as Runnable ).start()

          i++
      }
  }

只有当查询花费的时间比预期的时间长,但必须有另一个元素时,查询本身必须在MYSQL上进行操作,即使它被杀死了。 MYSQL正在吃它处理它。

我认为发生了什么

job 1 - hits app -> hits mysql ->    (9/10 left)
         {timeout} -> app killed  -> mysql running (9/10)
 job 2 - hits app -> hits mysql ->    (8/10 left)
         {timeout} -> app killed  -> mysql running (8/10) 
.....
 job 10 - hits app -> hits mysql ->    (10/10 left)
         {timeout} -> app killed  -> mysql running (10/10)
 job 11 - hits app -> 

如果到目前为止job1尚未完成,那么我们在游泳池中没有任何东西应用程序现在只是破坏了... jdbc错误抛出等等。没关系,如果它在崩溃后完成..

您可以checking mysql监控正在发生的事情 它似乎运行的时间更长,这违背了他们所建议的这个值应该做的事情,但是这可能不是真的基于任何一个并且与其他地方的问题有关。

虽然测试发现有两种状态:发送数据/发送给客户:

|  92 | root | localhost:58462 | test | Query   |   80 | Sending data      | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |
|  95 | root | localhost:58468 | test | Query   |  207 | Sending to client | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |
|  96 | root | localhost:58470 | test | Query   |  147 | Sending data      | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |
|  97 | root | localhost:58472 | test | Query   |  267 | Sending data      | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |
|  98 | root | localhost:58474 | test | Sleep   |   18 |                   | NULL                                                                                                 |
|  99 | root | localhost:58476 | test | Query   |  384 | Sending to client | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |
| 100 | root | localhost:58478 | test | Query   |  327 | Sending data      | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |

秒后:

|  91 | root | localhost:58460 | test | Query   |   67 | Sending to client | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |
|  92 | root | localhost:58462 | test | Query   |  148 | Sending to client | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |
|  97 | root | localhost:58472 | test | Query   |  335 | Sending to client | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test | |
| 100 | root | localhost:58478 | test | Query   |  395 | Sending to client | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |

Seconds after that: (all dead)
|  58 | root | localhost       | NULL | Query   |    0 | starting | show processlist |
|  93 | root | localhost:58464 | test | Sleep   |  167 |          | NULL             |
|  94 | root | localhost:58466 | test | Sleep   |  238 |          | NULL             |
|  98 | root | localhost:58474 | test | Sleep   |   74 |          | NULL             |
| 101 | root | localhost:58498 | test | Sleep   |   52 |          | NULL             |

可能需要创建一个脚本来监控进程列表,并且可能是一个更深层的结果集,其中包含运行的精确查询,以确定哪些查询事件正在查杀您的应用

答案 2 :(得分:0)

  

代码使用GenericDao,它在每个Dao类中都有扩展。 GenericDao使用Spring的JpaTemplate来获取EntityManager实例,该实例又用于所有数据库操作。我的理解是使用JpaTemplate处理内部关闭数据库连接的细节。

这可能是您问题的根源,您不应该使用JpaTemplate获取EntityManager这将为您提供非托管Entitymanager。实际上你根本不应该使用JpaTemplate

建议根据普通EntityManager API编写daos,然后像往常一样注入EntityManager(使用@PersistenceContext)。

如果您真的想使用JpaTemplate,请使用execute方法并传入JpaCallback,这将为您提供托管EntityManager

另外请确保您没有正确的tx设置连接,因为Spring不知道它应该关闭连接,因此无法正确设置连接。