使用tryTimeout 100工厂的Hazelcast IMap.tryLock卡住超过1分钟

时间:2018-05-15 13:57:12

标签: java hazelcast hazelcast-imap

我们正在使用Hazelcast 3.9.2

运行2个节点群集: Windows Server 2012 R2标准版 使用Oracle JAVA_VERSION =" 1.8.0_144"

从2个到20个客户端在不同的VM上运行: 3.10.0-327.28.3.el7.x86_64#1 SMP Fri Aug 12 13:21:05 EDT 2016 x86_64 x86_64 x86_64 GNU / Linux 与IBM JAVA_VERSION =" 1.7.1_64"

hazelcast.xml片段:

<map name="lock*">
    <in-memory-format>BINARY</in-memory-format>
    <statistics-enabled>true</statistics-enabled>
    <backup-count>1</backup-count>
    <eviction-policy>NONE</eviction-policy>
</map>

这是我们的hazelcast-client.xml

<hazelcast-client xmlns="http://www.hazelcast.com/schema/client-config" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.hazelcast.com/schema/client-config file:///C:/caching/hazelcast-client-config-3.9.xsd">
<group>
    <name>OUR_GROUP_NAME</name>
</group>
<properties>
    <property name="hazelcast.client.shuffle.member.list">true</property>
    <property name="hazelcast.client.heartbeat.timeout">60000</property>
    <property name="hazelcast.client.heartbeat.interval">5000</property>
    <property name="hazelcast.client.event.thread.count">10</property>
    <property name="hazelcast.client.event.queue.capacity">1000000</property>
    <property name="hazelcast.client.invocation.timeout.seconds">35</property>
    <property name="hazelcast.client.statistics.enabled">true</property>
</properties>
<network>
    <cluster-members>
        <address>tvlcacheqa1.blqa.qa:5709</address>
        <address>tvlcacheqa2.blqa.qa:5709</address>
    </cluster-members>
    <smart-routing>true</smart-routing>
    <redo-operation>true</redo-operation>
    <connection-attempt-period>15000</connection-attempt-period>
    <connection-attempt-limit>1048576</connection-attempt-limit>
<socket-options>
        <tcp-no-delay>false</tcp-no-delay>
        <keep-alive>true</keep-alive>
        <reuse-address>true</reuse-address>
        <linger-seconds>5</linger-seconds>
        <timeout>-1</timeout>
        <buffer-size>64</buffer-size>
    </socket-options>
</network>

<near-cache name="cache*">
    <in-memory-format>OBJECT</in-memory-format>
    <invalidate-on-change>true</invalidate-on-change>
    <time-to-live-seconds>1800</time-to-live-seconds>
    <max-idle-seconds>1800</max-idle-seconds>
    <eviction eviction-policy="LRU" max-size-policy="ENTRY_COUNT" size="10000"/>
</near-cache>   

  

上面没有提到的每个偏好都是DEFAULT,我们不会在代码中进行首选项更改。

在批处理中,我们同时在每个客户端上运行此代码:

synchronizer.startSyncSection(key, 100);
try {
     doSomeCriticalStuff();
} finally {
     synchronizer.endSyncSection(key);
}

这是我们基于Hazelcast Synchronizer功能的IMap实施:

@Override
public void startSynchedSection(MultiKey<?> key, long tryLockTimeoutInMs, long releaseLockTimeoutInMs) {
    keyNullCheck(key);
    tryLockTimeoutInMs = Math.max(tryLockTimeoutInMs, minimumObtainLockTimeoutInMs);
    if (isClusterReady()) {
        boolean locked = false;
        try {
            locked = this.locks.tryLock(key, tryLockTimeoutInMs, TimeUnit.MILLISECONDS, releaseLockTimeoutInMs, TimeUnit.MILLISECONDS);
        } catch (InterruptedException e) {
            throw new TechnicalException(e);
        }
        if (!locked) {
            throw new SyncTimeoutException(FAILED_TO_OBTAIN_EXCEPTION + key);
        }
        int lockCounter = incrementLockCounter(key);
    } else {
        throw new SyncTimeoutException(CLUSTER_NOT_READY_EXCEPTION);
    }
}
/** Note This should be called in a FINALLY section!!! */
@Override
public void endSynchedSection(MultiKey<?> key) {
    keyNullCheck(key);
    int lockCounterBefore = getThreadLocalCounter(key.toString()).get();
    if (lockCounterBefore == 0) {
        return;
    }
    try {
        int lockCounterAfter = decrementLockCounter(key);
        if (this.locks.isLocked(key)) {
            this.locks.unlock(key);
        }
    } catch (OperationTimeoutException e) {
        this.logger.warn("endSynchedSection - Lock-> {} was not released properly in Hazelcast because of exception:\n{}\n in Thread={}", key, e
            .getMessage(), Thread.currentThread().getName());
    }
}

有时(通常在我们运行我们的批次时)线程卡在这个IMap调用上:

locked = this.locks.tryLock(key, tryLockTimeoutInMs, TimeUnit.MILLISECONDS, releaseLockTimeoutInMs, TimeUnit.MILLISECONDS);

其中this.locksprivate IMap<String, String> locks;

虽然我们设置tryLockTimeoutInMs = 100ms线程可能会挂2分钟!

不幸的是,我们无法在测试环境中重现这种情况,但我们使用Dynatrace工具在生产中看到这样的报告:  https://user-images.githubusercontent.com/12655866/39863775-74a3da5a-53fc-11e8-96b4-d55bea1f5e06.PNG

我浏览了每个集群成员&amp;客户登录并没有找到特别的东西。此时有任何警告或连接丢失。

我有一些假设:

  1. 根据我的业务要求,我使用leaseTime = 30分钟呼叫IMap.tryLock(key, tryTime, TimeUnit.MILLISECONDS, leaseTime, TimeUnit.MILLISECONDS);,即我自己解锁:if (this.locks.isLocked(key)) this.locks.unlock(key);因此可能非常频繁地拨打IMap.isLocked(key)和/或{ {1}}与IMap.unlock(key)同时出现了这个原因?
  2. 我的会员和客户端有不同的JRE。尤其是来自IBM的客户端端的Websphere 8.5.5.12应用服务器和Java 1.7.1_64。也许是因为方法IMap.tryLock正在使用特定于架构的代码(因此它不安全&#39;)并且我们在此方法中看到了挂起。
  3. 我的hazelcast.xml首选项可能会出现一些错误,因为我们刚开始在项目中使用Hazelcast。
  4. 有关于哪些建议?

0 个答案:

没有答案