EC2上的JGroups节点虽然看到了彼此,却没有说话

时间:2012-11-09 17:00:19

标签: java hibernate-search infinispan jgroups

我正在尝试使用Hibernate Search,以便将来自jgroupsSlave节点的Lucene索引的所有写入发送到jgroupsMaster节点,然后使用Infinispan将Lucene索引共享回来。一切都在本地工作,但是当节点在EC2上发现彼此时,它们似乎没有进行通信。

他们都在互相发送有关你的信息。

# master output sample
86522 [LockBreakingService,localCache,archlinux-37498] DEBUG org.infinispan.transaction.TransactionTable  - About to cleanup completed transaction. Initial size is 0
86523 [LockBreakingService,LuceneIndexesLocking,archlinux-37498] DEBUG org.infinispan.transaction.TransactionTable  - About to cleanup completed transaction. Initial size is 0
87449 [Timer-4,luceneCluster,archlinux-37498] DEBUG org.jgroups.protocols.FD  - sending are-you-alive msg to archlinux-57950 (own address=archlinux-37498)
87522 [LockBreakingService,localCache,archlinux-37498] DEBUG org.infinispan.transaction.TransactionTable  - About to cleanup completed transaction. Initial size is 0
87523 [LockBreakingService,LuceneIndexesLocking,archlinux-37498] DEBUG org.infinispan.transaction.TransactionTable  - About to cleanup completed transaction. Initial size is 0

# slave output sample
85499 [LockBreakingService,localCache,archlinux-57950] DEBUG org.infinispan.transaction.TransactionTable  - About to cleanup completed transaction. Initial size is 0
85503 [LockBreakingService,LuceneIndexesLocking,archlinux-57950] DEBUG org.infinispan.transaction.TransactionTable  - About to cleanup completed transaction. Initial size is 0
86190 [Timer-3,luceneCluster,archlinux-57950] DEBUG org.jgroups.protocols.FD  - sending are-you-alive msg to archlinux-37498 (own address=archlinux-57950)
86499 [LockBreakingService,localCache,archlinux-57950] DEBUG org.infinispan.transaction.TransactionTable  - About to cleanup completed transaction. Initial size is 0
86503 [LockBreakingService,LuceneIndexesLocking,archlinux-57950] DEBUG org.infinispan.transaction.TransactionTable  - About to cleanup completed transaction. Initial size is 0

安全组

我有两个罐子,一个用于主人,一个用于奴隶,我在他们自己的EC2实例上运行。我可以ping另一个实例,它们都在同一个安全组中,它定义了我的组中任何机器之间通信的以下规则。

ICMP的所有端口 TCP为0-65535 UDP为0-65535

所以我认为这不是安全组配置问题。

hibernate.properties

# there is also a corresponding jgroupsSlave
hibernate.search.default.worker.backend=jgroupsMaster
hibernate.search.default.directory_provider = infinispan
hibernate.search.infinispan.configuration_resourcename=infinispan.xml
hibernate.search.default.data_cachename=localCache
hibernate.search.default.metadata_cachename=localCache

infinispan.xml

<infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xsi:schemaLocation="urn:infinispan:config:5.1 http://www.infinispan.org/schemas/infinispan-config-5.1.xsd"
            xmlns="urn:infinispan:config:5.1">
    <global>
        <transport clusterName="luceneCluster" transportClass="org.infinispan.remoting.transport.jgroups.JGroupsTransport">
            <properties>
                <property name="configurationFile" value="jgroups-ec2.xml" />
            </properties>
        </transport>
    </global>

    <default>
        <invocationBatching enabled="true" />
        <clustering mode="repl">

        </clustering>
    </default>

    <!-- this is just so that each machine doesn't have to store the index
         in memory -->
    <namedCache name="localCache">
        <loaders passivation="false" preload="true" shared="false">
            <loader class="org.infinispan.loaders.file.FileCacheStore" fetchPersistentState="true" ignoreModifications="false" purgeOnStartup="false">
                <properties>
                    <property name="location" value="/tmp/infinspan/master" />
                    <!-- there is a corresponding /tmp/infinispan/slave in
                    the slave config -->
                </properties>
            </loader>
        </loaders>
    </namedCache>
</infinispan>

的JGroups-ec2.xml

<config xmlns="urn:org:jgroups" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.2.xsd">
    <TCP
            bind_addr="${jgroups.tcp.address:127.0.0.1}"
            bind_port="${jgroups.tcp.port:7800}"
            loopback="true"
            port_range="30"
            recv_buf_size="20000000"
            send_buf_size="640000"
            max_bundle_size="64000"
            max_bundle_timeout="30"
            enable_bundling="true"
            use_send_queues="true"
            sock_conn_timeout="300"
            enable_diagnostics="false"

            bundler_type="old"

            thread_pool.enabled="true"
            thread_pool.min_threads="2"
            thread_pool.max_threads="30"
            thread_pool.keep_alive_time="60000"
            thread_pool.queue_enabled="false"
            thread_pool.queue_max_size="100"
            thread_pool.rejection_policy="Discard"

            oob_thread_pool.enabled="true"
            oob_thread_pool.min_threads="2"
            oob_thread_pool.max_threads="30"
            oob_thread_pool.keep_alive_time="60000"
            oob_thread_pool.queue_enabled="false"
            oob_thread_pool.queue_max_size="100"
            oob_thread_pool.rejection_policy="Discard"
            />
    <S3_PING secret_access_key="removed_for_stackoverflow" access_key="removed_for_stackoverflow" location="jgroups_ping" />

    <MERGE2 max_interval="30000"
            min_interval="10000"/>
    <FD_SOCK/>
    <FD timeout="3000" max_tries="3"/>
    <VERIFY_SUSPECT timeout="1500"/>
    <pbcast.NAKACK2
            use_mcast_xmit="false"
            xmit_interval="1000"
            xmit_table_num_rows="100"
            xmit_table_msgs_per_row="10000"
            xmit_table_max_compaction_time="10000"
            max_msg_batch_size="100"
            become_server_queue_size="0"/>
    <UNICAST2
            max_bytes="20M"
            xmit_table_num_rows="20"
            xmit_table_msgs_per_row="10000"
            xmit_table_max_compaction_time="10000"
            max_msg_batch_size="100"/>
    <RSVP />
    <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                   max_bytes="400000"/>
    <pbcast.GMS print_local_addr="false" join_timeout="7000" view_bundling="true"/>
    <UFC max_credits="2000000" min_threshold="0.10"/>
    <MFC max_credits="2000000" min_threshold="0.10"/>
    <FRAG2 frag_size="60000"/>
</config>

我直接从最近的infinispan-core发行版中复制了这个版本(5.2.0.Beta3,但我也尝试过5.1.4我认为它是这样)。我改变的唯一的东西是用我的s3_ping替换了,但是我再次看到节点写入s3,他们找到了彼此,所以我不认为这是问题所在。我也用它们的环境变量启动主/从,将jgroups.tcp.address设置为它们的私有IP地址。我还尝试了一些大大简化的配置而没有任何成功。

问题可能是什么?我花了几天时间玩弄它,这让我发疯了。我认为它必须是jgroups配置的东西,因为它在本地工作,只是无法在EC2上进行通话。

你们想要帮助解决这个问题的其他任何信息吗?

1 个答案:

答案 0 :(得分:6)

您正在启动两个JGroups通道,因此需要指定两个JGroups配置:一个用于Infinispan,另一个用于后端工作人员通信。

Infinispan和 jgroupsMaster 都将使用默认配置设置,除非您指定一个,但默认设置使用的是在EC2上无效的多播。

您似乎已为Infinispan索引配置了正确的配置,但您必须重新配置 jgroupsMaster 工作程序以使用S3_PING或JDBC_PING;它可能在本地工作,因为默认配置能够使用多播自动发现对等端。

此复制将由HSEARCH-882解决,我期待它能够显着简化配置。