Spark作业(Java)无法将数据写入Elasticsearch集群

时间:2017-06-13 14:49:26

标签: apache-spark elasticsearch docker-compose

我正在使用 Docker 17.04.0-ce Compose 1.12.0在Windows 中部署 Elasticsearch 群集(版本5.4。 0)通过Docker。 到目前为止,我已经完成了以下工作:

1)我通过以下配置

创建了一个Elasticsearch节点
  elasticsearch1:
    build: elasticsearch/
    container_name: es_1
    cap_add:
      - IPC_LOCK
    environment:
      - cluster.name=cp-es-cluster
      - node.name=cloud1
      - node.master=true
      - http.cors.enabled=true
      - http.cors.allow-origin="*"
      - bootstrap.memory_lock=true
      - discovery.zen.minimum_master_nodes=1
      - xpack.security.enabled=false
      - xpack.monitoring.enabled=false
      - "ES_JAVA_OPTS=-Xms1g -Xmx1g"
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - esdata1:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
      - 9300:9300
    networks:
      docker_elk:
        aliases:
          - elasticsearch

这会导致部署节点,但无法从Spark访问。我把数据写成

JavaEsSparkSQL.saveToEs(aggregators.toDF(), collectionName +"/record");

我收到以下错误,但节点正在运行

I/O exception (java.net.ConnectException) caught when processing request: Connection timed out: connect

2)如果我在节点配置

中添加以下行,我发现此问题已解决
- network.publish_host=${ENV_IP}

3)然后我为2个额外的节点创建了类似的配置

  elasticsearch1:
    build: elasticsearch/
    container_name: es_1
    cap_add:
      - IPC_LOCK
    environment:
      - cluster.name=cp-es-cluster
      - node.name=cloud1
      - node.master=true
      - http.cors.enabled=true
      - http.cors.allow-origin="*"
      - bootstrap.memory_lock=true
      - discovery.zen.minimum_master_nodes=1
      - xpack.security.enabled=false
      - xpack.monitoring.enabled=false
      - "ES_JAVA_OPTS=-Xms1g -Xmx1g"
      - network.publish_host=${ENV_IP}
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - esdata1:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
      - 9300:9300
    networks:
      docker_elk:
        aliases:
          - elasticsearch

  elasticsearch2:
    build: elasticsearch/
    container_name: es_2
    cap_add:
      - IPC_LOCK
    environment:
      - cluster.name=cp-es-cluster
      - node.name=cloud2
      - http.cors.enabled=true
      - http.cors.allow-origin="*"
      - bootstrap.memory_lock=true
      - discovery.zen.minimum_master_nodes=2
      - xpack.security.enabled=false
      - xpack.monitoring.enabled=false
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - "discovery.zen.ping.unicast.hosts=elasticsearch1"
      - node.master=false
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - esdata2:/usr/share/elasticsearch/data
    ports:
      - 9201:9200
      - 9301:9300
    networks:
      - docker_elk

  elasticsearch3:
    build: elasticsearch/
    container_name: es_3
    cap_add:
      - IPC_LOCK
    environment:
      - cluster.name=cp-es-cluster
      - node.name=cloud3
      - http.cors.enabled=true
      - http.cors.allow-origin="*"
      - bootstrap.memory_lock=true
      - discovery.zen.minimum_master_nodes=2
      - xpack.security.enabled=false
      - xpack.monitoring.enabled=false
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - "discovery.zen.ping.unicast.hosts=elasticsearch1"
      - node.master=false
    ulimits:
      memlock:
        soft: -1
        hard: -1
      nofile:
        soft: 65536
        hard: 65536
    volumes:
      - esdata3:/usr/share/elasticsearch/data
    ports:
      - 9202:9200
      - 9302:9300
    networks:
      - docker_elk

这导致成功创建3个节点的集群。但是,Spark中再次出现相同的错误,无法将数据写入群集。即使我将network.publish_host添加到所有节点,我也会得到相同的行为。

关于Spark,我使用 elasticsearch-spark-20_2.11 版本5.4.0(与ES版本相同)。任何想法如何解决这个问题?

1 个答案:

答案 0 :(得分:1)

我设法解决了这个问题。除了在Spark中设置es.nodeses.port之外,如果我将es.nodes.wan.only设置为true,问题就会消失。