我们正在删除Cassandra中的大量记录。我们收到以下错误。当我们插入大量记录时,我们也会收到此错误:
Error performing remove on 10.130.279.40:9160: exception 'TTransportException' with message 'TSocket: timed out reading 4 bytes from 10.130.279.40:9160' in /home/zonefiles/php/thrift/transport/TSocket.php:268
Stack trace:
0 /home/zonefiles/php/thrift/transport/TTransport.php(87): TSocket->read(4)
1 /home/zonefiles/php/thrift/transport/TFramedTransport.php(135): TTransport->readAll(4)
2 /home/zonefiles/php/thrift/transport/TFramedTransport.php(102): TFramedTransport->readFrame()
3 [internal function]: TFramedTransport->read(8192)
4 /home/zonefiles/php/thrift/packages/cassandra/Cassandra.php(691): thrift_protocol_read_binary(Object(TBinaryProtocolAccelerated), 'cassandra_Cassa...', false)
5 /home/zonefiles/php/thrift/packages/cassandra/Cassandra.php(664): CassandraClient->recv_remove()
6 [internal function]: CassandraClient->remove('CUSTOMERSERVICE...', Object(cassandra_ColumnPath), 1301555573936295, 1)
7 /home/zonefiles/php/connection.php(230): call_user_func_array(Array, Array)
8 /home/zonefiles/php/columnfamily.php(582): ConnectionPool->call('remove', 'CUSTOMERSERVICE...', Object(cassandra_ColumnPath), 1301555573936295, 1)
9 /home/zonefiles/php/delete.php(34): ColumnFamily->remove('CUSTOMERSERVICE...')
10 {main}
Error connecting to 10.130.279.40:9160: exception 'TTransportException' with message 'TSocket: timed out reading 4 bytes from 10.130.279.40:9160' in /home/zonefiles/php/thrift/transport/TSocket.php:268
Stack trace:
0 /home/zonefiles/php/thrift/transport/TTransport.php(87): TSocket->read(4)
1 /home/zonefiles/php/thrift/transport/TFramedTransport.php(135): TTransport->readAll(4)
2 /home/zonefiles/php/thrift/transport/TFramedTransport.php(102): TFramedTransport->readFrame()
3 [internal function]: TFramedTransport->read(8192)
4 /home/zonefiles/php/thrift/packages/cassandra/Cassandra.php(1015): thrift_protocol_read_binary(Object(TBinaryProtocolAccelerated), 'cassandra_Cassa...', false)
5 /home/zonefiles/php/thrift/packages/cassandra/Cassandra.php(992): CassandraClient->recv_describe_version()
6 /home/zonefiles/php/connection.php(63): CassandraClient->describe_version()
7 /home/zonefiles/php/connection.php(163): ConnectionWrapper->__construct('CDTMain1', '10.130.279.40:9...', NULL, true, 5000, 5000)
8 /home/zonefiles/php/connection.php(254): ConnectionPool->make_conn()
9 /home/zonefiles/php/connection.php(241): ConnectionPool->handle_conn_failure(Object(ConnectionWrapper), 'remove', Object(TTransportException), 1)
10 /home/zonefiles/php/columnfamily.php(582): ConnectionPool->call('remove', 'CUSTOMERSERVICE...', Object(cassandra_ColumnPath), 1301555573936295, 1)
11 /home/zonefiles/php/delete.php(34): ColumnFamily->remove('CUSTOMERSERVICE...')
12 {main}
以下是我们用来生成错误的PHP:
<?php
set_time_limit(2000);
require 'connection.php';
require 'columnfamily.php';
$servers[0]['host'] = 'private ip';
$servers[0]['port'] = '9160';
$conn = new Connection('Server11', $servers);
$urlFamily = new ColumnFamily($conn, 'Domain'); // ColumnFamily
$start = microtime(true);
$limit = 100000000;
$rows = $urlFamily->get_range($key_start='', $key_finish='zzzzzzzzzzzzzzz',100000000);
$num = 0;
$delCount = 0;
foreach($rows as $key => $columns) {
// Do stuff with $key or $columns
if (strpos($key, ' .net') !== false) {
//echo 'deleting ' . $key . "\n";
$urlFamily->remove($key);
$delCount++;
}
if ($num++ > 100000000) break;
//$num++;
if ($num % 100000 == 0) echo $num . "\n";
}
$end = microtime(true);
echo $num . " total\n";
echo $delCount . ' deleted in ' . ($end - $start) . " seconds\n";
echo $delCount / ($end - $start) . " deleted per second\n";
?>
我们在Fedora 14 Laughlin和Thrift 0.5.0上运行PHP 5.3.5。
一种理论认为,这是由Cassandra无法足够快地处理命令引起的。你同意/不同意吗?你以前见过这个吗?
如果您建议删除其他方式(例如截断),当我们使用Cassandra执行其他操作时,我们如何仍然可以防止此问题发生?
答案 0 :(得分:2)
那些只是日志消息,还是实际引发的异常?每次在使用不同的连接重试之前捕获此类异常时,phpcassa都会调用error_log()。基本上,这意味着您应该密切关注记录的堆栈跟踪,但您不必过于担心它们。
这些是客户端套接字超时,这意味着调用花费的时间超过默认超时5秒。为什么这些事情首先发生在很大程度上取决于Cassandra的行为方式。监控Cassandra可能是最好的起点。
答案 1 :(得分:0)
根据我的程序员的说法,我们实际上是通过将超时提升到非常高的值来解决这个问题。我们试图导入一个5GB的文件,所以我猜每次读取所需的db超过5秒。
以下是设置的具体超时:
$ send_timeout = 60000 $ recv_timeout = 60000