来自aerospike的查询断管错误

时间:2018-04-11 10:34:49

标签: aerospike

我有命名空间" test"并设置" demo" 当我跑步"选择*来自test.demo"在aql终端,我收到了这个错误。究竟是什么导致管道破裂?

enter image description here

我在下面的服务器日志中收到了一条警告消息。

server log

我的aerospike.conf是:

service {
    paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
    proto-fd-max 15000
}

logging {
    file /var/log/aerospike/aerospike.log {
            context any info
    }
}

network {
    service {
            address any
            port 3000
    }

    heartbeat {
            mode multicast
            multicast-group 239.1.99.222
            port 9918

            # To use unicast-mesh heartbeats, remove the 3 lines above, and see
            # aerospike_mesh.conf for alternative.

            interval 150
            timeout 10
    }

    fabric {
            port 3001
    }

    info {
            port 3003
    }
}

namespace test {
    replication-factor 2
    memory-size 4G
    default-ttl 30d # 30 days, use 0 to never expire/evict.

    storage-engine memory
}
namespace bar {
    replication-factor 2
    memory-size 4G
    default-ttl 30d # 30 days, use 0 to never expire/evict.

    storage-engine memory

    # To use file storage backing, comment out the line above and use the
    # following lines instead.
    #       storage-engine device {
    #               file /opt/aerospike/data/bar.dat
    #               filesize 16G
    #               data-in-memory true # Store data in memory in addition to file.
    #       }
}

有人可以找出原因吗?

1 个答案:

答案 0 :(得分:3)

我认为在尝试将扫描结果发送到客户端已经超时的套接字时,会出现套接字错误。

Error: (-10) Socket read error: 11, [::1]:3000, 36006

默认情况下,aql timeout设置为1000ms

使用-T命令行选项可以将其提升到100000ms。 (或在aql交互模式下使用set timeout)

aql -T 100000

<强> -T, --timeout <ms> Set the timeout (ms) for commands. Default: 1000 此选项相当于在other clients上设置TotalTimeout。

将超时设置得更高应该会有所帮助,但不能回答基本扫描需要这么长时间的原因。

以下是设置不同客户端超时的示例,这表示客户端在接收扫描结果之前超时。在日志中,您将看到TCP发送错误以进行扫描。

WARNING (proto): (proto.c:693) send error - fd 32 Broken pipe

来自aql console的详细信息:

aql> set timeout 10
TIMEOUT = 10
aql> select * from test.demo
Error: (-10) Socket read error: 11, 127.0.0.1:3000, 58496

aql> select * from test.demo
Error: (-10) Socket read error: 115, 127.0.0.1:3000, 58498


aql> set timeout 100
TIMEOUT = 100
aql> select * from test.demo
Error: (-10) Socket read error: 115, 127.0.0.1:3000, 58492

aql> set timeout 1000
TIMEOUT = 1000
aql> select * from test.demo
+-----+-------+
| foo | bar   |
+-----+-------+
| 123 | "abc" |
+-----+-------+
1 row in set (0.341 secs)

如果默认超时保持在1000毫秒,为什么你的aql客户端会因为返回1条记录而超时,这仍然是个谜。你有没有机会修改超时。或者在具有空集的测试命名空间中有大量记录。