问题
我最近负责Java Web应用程序,代码已经编写完成。该应用程序接收适度高流量,并在每天上午11点至下午3点之间达到高峰时段。 该应用程序使用Spring,JPA(Hibernate),MYSQL DB。 Spring已配置为使用tomcat jdbc连接池来建立与DB的连接。 (帖子末尾的配置细节)
在过去的几天里,在应用程序的高峰负载时间内,由于tomcat对请求没有响应,应用程序一直在停止运行。它需要多次重启tomcat。
通过tomcat catalina.out日志,我注意到了很多
Caused by: java.sql.SQLException: Connection has already been closed.
at org.apache.tomcat.jdbc.pool.ProxyConnection.invoke(ProxyConnection.java:117)
at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:109)
at org.apache.tomcat.jdbc.pool.DisposableConnectionFacade.invoke(DisposableConnectionFacade.java:80)
at com.sun.proxy.$Proxy28.prepareStatement(Unknown Source)
at org.hibernate.jdbc.AbstractBatcher.getPreparedStatement(AbstractBatcher.java:505)
at org.hibernate.jdbc.AbstractBatcher.getPreparedStatement(AbstractBatcher.java:423)
at org.hibernate.jdbc.AbstractBatcher.prepareQueryStatement(AbstractBatcher.java:139)
at org.hibernate.loader.Loader.prepareQueryStatement(Loader.java:1547)
at org.hibernate.loader.Loader.doQuery(Loader.java:673)
at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:236)
at org.hibernate.loader.Loader.loadCollection(Loader.java:1994)
... 115 more
这些经常在崩溃前出现。
在这些例外之前更进一步,我注意到在连接关闭异常之前放弃了很多连接。
WARNING: Connection has been abandoned PooledConnection[com.mysql.jdbc.Connection@543c2ab5]:java.lang.Exception
at org.apache.tomcat.jdbc.pool.ConnectionPool.getThreadDump(ConnectionPool.java:1065)
at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:782)
at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:618)
at org.apache.tomcat.jdbc.pool.ConnectionPool.getConnection(ConnectionPool.java:188)
at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:128)
at org.hibernate.ejb.connection.InjectedDataSourceConnectionProvider.getConnection(InjectedDataSourceConnectionProvider.java:47)
at org.hibernate.jdbc.ConnectionManager.openConnection(ConnectionManager.java:423)
at org.hibernate.jdbc.ConnectionManager.getConnection(ConnectionManager.java:144)
at org.hibernate.jdbc.AbstractBatcher.prepareQueryStatement(AbstractBatcher.java:139)
这些似乎经常出现在Connection Closed异常之前。这些似乎是日志即将崩溃的第一个症状。
分析
通过日志,我开始查看是否存在可能导致问题的任何连接池配置/ mysql配置。 通过一些优秀的文章,展示了生产环境池的调整。链接1& 2
通过这些文章,我注意到了:
JHanik的文章(链接1)中的以下行提到了这个
将abandonWhenPercentageFull的值设置为100意味着连接不是>被视为放弃,除非我们达到了maxActive限制。
我认为这在我的案例中可能很重要,因为我看到许多关系被抛弃了。
我的max_connections设置与推荐的设置不匹配(在链接2中)
mysql max_connections应该等于max_active + max_idle
我做了什么
因此,根据文章的建议,我做了以下两件事:
这个没有帮助
第二天,在高峰时段进行了以下观察:
最后,我的问题:
很明显,从应用服务器进行数据库连接的方式存在问题。 所以我有两个方向来推进这个分析。
我的问题是我应该采取哪些措施?
1。问题不在于连接池设置。代码是造成问题的原因
代码中可能存在未关闭数据库连接的位置。这导致大量连接被打开。
代码使用GenericDao,它在每个Dao类中都有扩展。 GenericDao使用Spring的JpaTemplate来获取EntityManager实例,该实例又用于所有数据库操作。我的理解是使用JpaTemplate处理内部关闭数据库连接的细节。
那么,我究竟应该在哪里寻找可能的连接泄漏?
2。问题在于连接pool / mysql配置参数。但是,我需要进一步调整优化
如果是,我应该查看哪些参数? 我应该收集一些数据来确定我的连接池的更合适的值。 (例如,对于max_active,max_idle,max_connections)
附录:完整的连接池配置
<bean id="dataSource" class="org.apache.tomcat.jdbc.pool.DataSource" destroy-method="close">
<property name="driverClassName" value="com.mysql.jdbc.Driver" />
<property name="url" value="jdbc:mysql://xx.xx.xx.xx" />
<property name="username" value="xxxx" />
<property name="password" value="xxxx" />
<property name="initialSize" value="10" />
<property name="maxActive" value="350" />
<property name="maxIdle" value="250" />
<property name="minIdle" value="90" />
<property name="timeBetweenEvictionRunsMillis" value="30000" />
<property name="removeAbandoned" value="true" />
<property name="removeAbandonedTimeout" value="60" />
<property name="abandonWhenPercentageFull" value="100" />
<property name="testOnBorrow" value="true" />
<property name="validationQuery" value="SELECT 1" />
<property name="validationInterval" value="30000" />
<property name="logAbandoned" value="true" />
<property name="jmxEnabled" value="true" />
</bean>
答案 0 :(得分:10)
对于OP而言,这已经非常晚了,但未来可能会对其他人有所帮助:
我在长期运行批处理作业的生产环境中遇到类似的问题。问题是如果您的代码需要的连接时间超过属性指定的时间:
name="removeAbandonedTimeout" value="60
您已启用:
<property name="removeAbandoned" value="true" />
然后在60秒后处理过程中断开连接。一种可能的解决方法(对我来说没有用)是启用拦截器:
jdbcInterceptors="ResetAbandonedTimer"
这将为每次发生的读/写重置该连接的放弃计时器。不幸的是,在我的情况下,在读取/写入数据库之前,处理有时仍然需要比超时更长的时间。所以我被迫要么超过超时长度,要么禁用removeAbandonded(我选择了以前的解决方案)。
希望如果他们遇到类似的东西,这可以帮助别人!
答案 1 :(得分:1)
我最近被要求调查为什么生产系统有时会下降。我想分享我的发现,因为它涉及事件的关联,以便将JVM tomcat应用程序与上面概述的JDBC问题一起实际崩溃应用程序。这是使用mysql作为后端,所以可能对这种情况最有用,但如果在另一个平台上遇到问题可能会相同。
只需关闭连接并不意味着应用程序已损坏
这是在grails应用程序下,但将与所有与JVM相关的应用程序相关:
tomcat/context.xml
db配置,注意非常小的db池和
removeAbandonedTimeout="10"
你们我们想让事情破裂
<Resource
name="jdbc/TestDB" auth="Container" type="javax.sql.DataSource"
driverClassName="com.mysql.jdbc.Driver"
url="jdbc:mysql://127.0.0.1:3306/test"
username="XXXX"
password="XXXX"
testOnBorrow="true"
testWhileIdle="true"
testOnReturn="true"
factory="org.apache.tomcat.jdbc.pool.DataSourceFactory"
removeAbandoned="true"
logAbandoned="true"
removeAbandonedTimeout="10"
maxWait="5000"
initialSize="1"
maxActive="2"
maxIdle="2"
minIdle="2"
validationQuery="Select 1" />
每分钟运行一次石英作业,而不是我认为第一次尝试就死的应用程序非常重要:
class Test2Job {
static triggers = {
cron name: 'test2', cronExpression: "0 0/1 * * * ?"
}
def testerService
def execute() {
println "starting job2 ${new Date()}"
testerService.basicTest3()
}
}
现在我们的testService带有评论,请关注评论:
def dataSource
/**
* When using this method in quartz all the jdbc settings appear to get ignored
* the job actually completes notice huge sleep times compared to basicTest
* strange and very different behaviour.
* If I add Tester t = Tester.get(1L) and then execute below query I will get
* connection pool closed error
* @return
*/
def basicTest2() {
int i=1
while (i<21) {
def sql = new Sql(dataSource)
def query="""select id as id from tester t
where id=:id"""
def instanceList = sql.rows(query,[id:i as Long],[timeout:90])
sleep(11000)
println "-- working on ${i}"
def sql1 = new Sql(dataSource)
sql1.executeUpdate(
"update tester t set t.name=? where t.id=?",
['aa '+i.toString()+' aa', i as Long])
i++
sleep(11000)
}
println "run ${i} completed"
}
/**
* This is described in above oddity
* so if this method is called instead you will see connection closed issues
*/
def basicTest3() {
int i=1
while (i<21) {
def t = Tester.get(i)
println "--->>>> test3 t ${t.id}"
/**
* APP CRASHER - This is vital and most important
* Without this declared lots of closed connections and app is working
* absolutely fine,
* The test was originally based on execRun() which returns 6650 records or something
* This test query is returned in time and does not appear to crash app
*
* The moment this method is called and please check what it is currently doing. It is simply
* running a huge query which go beyond the time out values and as explained in previous emails MYSQL states
*
* The app is then non responsive and logs clearly show application is broke
*/
execRun2()
def sql1 = new Sql(dataSource)
sleep(10000)
sql1.executeUpdate("update tester t set t.name=? where t.id=?",['aa '+i.toString()+' aa', t.id])
sleep(10000)
i++
}
}
def execRun2() {
def query="""select new map (t as tester) from Tester t left join t.children c
left join t.children c
left join c.childrena childrena
left join childrena.childrenb childrenb
left join childrenb.childrenc childrenc , Tester t2 left join t2.children c2 left join t2.children c2
left join c2.childrena children2a
left join children2a.childrenb children2b
left join children2b.childrenc children2c
where ((c.name like (:name) or
childrena.name like (:name) or
childrenb.name like (:name) or (childrenc is null or childrenc.name like (:name))) or
(
c2.name like (:name) or
children2a.name like (:name) or
children2b.name like (:name) or (children2c is null or children2c.name like (:name))
))
"""
//println "query $query"
def results = Tester.executeQuery(query,[name:'aa'+'%'],[timeout:90])
println "Records: ${results.size()}"
return results
}
/**
* This is no different to basicTest2 and yet
* this throws a connection closed error and notice it is 20 not 20000
* quite instantly a connection closed error is thrown when a .get is used vs
* sql = new Sql(..) is a manuall connection
*
*/
def basicTest() {
int i=1
while (i<21) {
def t = Tester.get(i)
println "--- t ${t.id}"
sleep(20)
//println "publishing event ${event}"
//new Thread({
// def event=new PurchaseOrderPaymentEvent(t,t.id)
// publishEvent(event)
//} as Runnable ).start()
i++
}
}
只有当查询花费的时间比预期的时间长,但必须有另一个元素时,查询本身必须在MYSQL上进行操作,即使它被杀死了。 MYSQL正在吃它处理它。
我认为发生了什么
job 1 - hits app -> hits mysql -> (9/10 left)
{timeout} -> app killed -> mysql running (9/10)
job 2 - hits app -> hits mysql -> (8/10 left)
{timeout} -> app killed -> mysql running (8/10)
.....
job 10 - hits app -> hits mysql -> (10/10 left)
{timeout} -> app killed -> mysql running (10/10)
job 11 - hits app ->
如果到目前为止job1尚未完成,那么我们在游泳池中没有任何东西应用程序现在只是破坏了... jdbc错误抛出等等。没关系,如果它在崩溃后完成..
您可以checking mysql监控正在发生的事情 它似乎运行的时间更长,这违背了他们所建议的这个值应该做的事情,但是这可能不是真的基于任何一个并且与其他地方的问题有关。
虽然测试发现有两种状态:发送数据/发送给客户:
| 92 | root | localhost:58462 | test | Query | 80 | Sending data | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |
| 95 | root | localhost:58468 | test | Query | 207 | Sending to client | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |
| 96 | root | localhost:58470 | test | Query | 147 | Sending data | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |
| 97 | root | localhost:58472 | test | Query | 267 | Sending data | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |
| 98 | root | localhost:58474 | test | Sleep | 18 | | NULL |
| 99 | root | localhost:58476 | test | Query | 384 | Sending to client | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |
| 100 | root | localhost:58478 | test | Query | 327 | Sending data | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |
秒后:
| 91 | root | localhost:58460 | test | Query | 67 | Sending to client | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |
| 92 | root | localhost:58462 | test | Query | 148 | Sending to client | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |
| 97 | root | localhost:58472 | test | Query | 335 | Sending to client | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test | |
| 100 | root | localhost:58478 | test | Query | 395 | Sending to client | select tester0_.id as col_0_0_ from tester tester0_ left outer join tester_childa children1_ on test |
Seconds after that: (all dead)
| 58 | root | localhost | NULL | Query | 0 | starting | show processlist |
| 93 | root | localhost:58464 | test | Sleep | 167 | | NULL |
| 94 | root | localhost:58466 | test | Sleep | 238 | | NULL |
| 98 | root | localhost:58474 | test | Sleep | 74 | | NULL |
| 101 | root | localhost:58498 | test | Sleep | 52 | | NULL |
可能需要创建一个脚本来监控进程列表,并且可能是一个更深层的结果集,其中包含运行的精确查询,以确定哪些查询事件正在查杀您的应用
答案 2 :(得分:0)
代码使用GenericDao,它在每个Dao类中都有扩展。 GenericDao使用Spring的JpaTemplate来获取EntityManager实例,该实例又用于所有数据库操作。我的理解是使用JpaTemplate处理内部关闭数据库连接的细节。
这可能是您问题的根源,您不应该使用JpaTemplate
获取EntityManager
这将为您提供非托管Entitymanager
。实际上你根本不应该使用JpaTemplate
。
建议根据普通EntityManager
API编写daos,然后像往常一样注入EntityManager
(使用@PersistenceContext
)。
如果您真的想使用JpaTemplate
,请使用execute
方法并传入JpaCallback
,这将为您提供托管EntityManager
。
另外请确保您没有正确的tx设置连接,因为Spring不知道它应该关闭连接,因此无法正确设置连接。