运行大量数据时,使用platfOrm重置连接

时间:2012-11-09 19:23:11

标签: java cassandra playorm astyanax

我有一个hadoop进程连接到reduce部分中的cassandra键空间。数据由playORM保存。 会发生什么:我在同一台机器上运行这个hadoop进程和cassandra,所以playORM只是连接到localhost上的cassandra。当我处理少量数据时,该过程运行完全正常,但是当我处理更大的数量(在这种情况下只有500 000条记录)时,我得到了例外。 我想知道它是否可能是astyanax池配置中的问题(由playORM完成,因此我不知道如何更改这些设置)或者它是否可能是playORM本身甚至是我的Cassandra配置中的问题。现在一切都在一个主机上运行,​​我认为配置集群时情况可能会变得更糟,因为许多hadoop机器将连接到许多cassandra机器。

任何可能出错的提示?

CF=[tablename=Localization] persist rowkey=1bd9b46a-5b66-41ae-9756-dd91f44194ea
CF=User index persist(cf=[tablename=User])=[rowkey=/User/id] (table found, colmeta not found)
CF=[tablename=User] persist rowkey=1bd9b46a-5b66-41ae-9756-dd91f44194ea
java.lang.RuntimeException: com.netflix.astyanax.connectionpool.exceptions.ConnectionAbortedException: ConnectionAbortedException: [host=localhost(127.0.0.1):9160, latency=611(611), attempts=1] org.apache.thrift.t
ransport.TTransportException: java.net.SocketException: Connection reset
        at com.alvazan.orm.layer9z.spi.db.cassandra.CassandraSession.sendChanges(CassandraSession.java:110)
        at com.alvazan.orm.logging.NoSqlRawLogger.sendChanges(NoSqlRawLogger.java:50)
        at com.alvazan.orm.layer5.nosql.cache.NoSqlWriteCacheImpl.flush(NoSqlWriteCacheImpl.java:125)
        at com.alvazan.orm.layer5.nosql.cache.NoSqlReadCacheImpl.flush(NoSqlReadCacheImpl.java:178)
        at com.alvazan.orm.layer0.base.BaseEntityManagerImpl.flush(BaseEntityManagerImpl.java:182)
        at com.s1mbi0se.dmp.da.dao.UserDao.insertOrUpdateUser(UserDao.java:24)
        at com.s1mbi0se.dmp.da.dao.UserDao.insertOrUpdateUserLocalization(UserDao.java:75)
        at com.s1mbi0se.dmp.da.service.DataAccessService.insertLocalizationForUser(DataAccessService.java:44)
        at com.s1mbi0se.dmp.module.LocalizationModule.persistData(LocalizationModule.java:218)
        at com.s1mbi0se.dmp.processor.mapred.SelectorReducer.reduce(SelectorReducer.java:60)
        at com.s1mbi0se.dmp.processor.mapred.SelectorReducer.reduce(SelectorReducer.java:1)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
Caused by: com.netflix.astyanax.connectionpool.exceptions.ConnectionAbortedException: ConnectionAbortedException: [host=localhost(127.0.0.1):9160, latency=611(611), attempts=1] org.apache.thrift.transport.TTranspo
rtException: java.net.SocketException: Connection reset
        at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:193)
        at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:60)
        at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:27)
        at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$1.execute(ThriftSyncConnectionFactoryImpl.java:131)
        at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:52)
        at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:229)
        at com.netflix.astyanax.thrift.ThriftKeyspaceImpl.executeOperation(ThriftKeyspaceImpl.java:455)
        at com.netflix.astyanax.thrift.ThriftKeyspaceImpl.access$400(ThriftKeyspaceImpl.java:62)
        at com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1.execute(ThriftKeyspaceImpl.java:115)
        at com.alvazan.orm.layer9z.spi.db.cassandra.CassandraSession.sendChangesImpl(CassandraSession.java:131)
        at com.alvazan.orm.layer9z.spi.db.cassandra.CassandraSession.sendChanges(CassandraSession.java:108)
        ... 14 more
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
        at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
        at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
        at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
        at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
        at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
        at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
        at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:913)
        at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:899)
        at com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1$1.internalExecute(ThriftKeyspaceImpl.java:121)
        at com.netflix.astyanax.thrift.ThriftKeyspaceImpl$1$1.internalExecute(ThriftKeyspaceImpl.java:118)
        at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:55)
        ... 23 more
Caused by: java.net.SocketException: Connection reset
        at java.net.SocketInputStream.read(Unknown Source)
        at java.net.SocketInputStream.read(Unknown Source)
        at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
        ... 36 more

1 个答案:

答案 0 :(得分:2)

注意:我想我也曾遇到过这个问题,并且在astyanax中超时了超时或连接池大小并且它也消失了所以也尝试了(尽管连接重置通常是远程服务器的错...即cassandra)

确定连接重置通常是因为另一端(cassandra)关闭了您的连接。要100%确定,如果你做一个wireshark,你应该看到哪一端正在关闭套接字。

小心你在这篇文章中读到的内容......

java.net.SocketException: Connection reset

但基本上,我在mina,netty等之前就曾在sourceforge上写过channelmanager。大多数情况下,当其他结束正确关闭套接字时,你得到-1 .......他们需要发送一些数据包。如果它们只是消失,它可能会导致连接重置等整洁的异常。

我建议摆弄astyanax连接池。看看wireshark虽然和谷歌如何发生tcp拆解,看看cassandra是否没有正确拆除它。

如果您使用的是Linux,请尝试使用netstat -anp | grep {pid}所以你可以看到你的客户端进程正在使用的端口,并在wireshark中查找这些端口上的数据包。此外,做一个测试,以确保astyanax正确地保持它的池正确意味着在此过程中运行netstat命令几次以确保astyanax不创建套接字然后删除它们并再次创建它们(就好像它删除了一个和然后你写信给它,你可以得到上面的错误)

java nio的东西在封面上从来都不是完全可靠的.....到今天,我仍然有单元测试来演示不同操作系统上的nio库中的错误。

出于好奇,当我注意到你正在写一篇文章的时候,你有多少冲洗管道,如果写入成功与否,读取基本上无法获得状态。

在接下来的几个月里,我们希望有一个通用的map / reduce来为map / reduce代码提供实际的实体。我们终于找到并向新开发人员发送了一份报价,我们很快就会加入我们的工作量。

另一个好的帖子是这个

http://kb.realvnc.com/questions/75/I%27m+receiving+the+error+%22Connection+reset+by+peer+%2810054%29%22.+

wireshark可以真正告诉你tcp层发生的事情的细节。我一直想研究更多的细节是astyanax还是cassandra的错,但没有时间。

迪安