Nutch Indexing作业无法推送到ELK Stack

时间:2019-05-09 15:38:16

标签: docker apache-kafka elastic-stack nutch

我正在尝试建立一个ELK堆栈,该堆栈从单独的Docker容器上接收来自Kafka Consumers的索引。

结构为:
Kafka生产者(在本地运行以进行测试)向Kafka群集(http://xxx.xxx.xxx.xxx:9000)发送消息。集群将消息转发到Kafka使用者(Docker容器1)。消费者对Kafka消息中的URL适口。结束时,应该将通过索引发送到ELK堆栈(Docker容器2),但是此步骤失败。

Segment dir is complete: crawl2019-05-09150441/segments/20190509150506.
Indexer: starting at 2019-05-09 15:05:56
Indexer: deleting gone documents: false
Indexer: URL filtering: false
Indexer: URL normalizing: false
No exchange was configured. The documents will be routed to all index writers.
Active IndexWriters :
ElasticRestIndexWriter:

...

Indexing job did not succeed, job status:FAILED, reason: NA
Indexer: java.lang.RuntimeException: Indexing job did not succeed, job status:FAILED, reason: NA
    at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:150)
    at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:231)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:240)

Error running:
  /WebContentExtraction/NutchScript/nutch/dist/apache-nutch-1.16-SNAPSHOT-bin/bin/nutch index crawl2019-05-09150441/crawldb -linkdb crawl2019-05-09150441/linkdb crawl2019-05-09150441/segments/20190509150506
Failed with exit value 255.

两个Docker容器都在本地主机上运行,​​并且我正在使用ELK堆栈的默认端口。

Nutch版本1.16(高级版)
ELK版本sebp / elk:622

我正在使用elastic.rest索引器,并尝试了几种不同版本的ELK堆栈。

我被限制使用nutch的主版本,因为它具有我需要的插件。

Kafka Consumer运行命令:

docker run -dit \
--name kafka1 \
-e HOST=localhost \
-e PORT=9200 \
-e INDEX=nutch \
-e TOPIC=health \
-e INDEXER=elastic.rest \
wce ./startup

ELK堆栈设置:

docker-compose -f elkDockerComp.yaml up elk

elkDockerComp.yaml:

elk:
  image: sebp/elk:622
  ports:
    - "5601:5601"
    - "9200:9200"
    - "5044:5044"

我认为这是Docker容器之间的网络问题,但是,我似乎找不到任何解决方案。

编辑

我已将两个容器都添加到docker网络中,并且使用者的ping命令找到了ELK堆栈,因此它们现在可以相互到达。

docker network inspect myNet
[
    {
        "Name": "myNet",
        "Id": "ecb47fb89a70ce4979420663287e67c2324dbd3ccf2367ead18261eb3e6089d8",
        "Created": "2019-05-09T15:57:19.3169609Z",
        "Scope": "local",
        "Driver": "bridge",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": {},
            "Config": [
                {
                    "Subnet": "172.19.0.0/16",
                    "Gateway": "172.19.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": false,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "5b3764610f448098124b6a6dc2a28621858795394f3f280e6acc9aed8c13c0bf": {
                "Name": "kafka1",
                "EndpointID": "7380523f048bc516e6b0eb5811f3314804d53e0f758c1ef0aa9f7b049b1f7b4b",
                "MacAddress": "02:42:ac:13:00:02",
                "IPv4Address": "172.19.0.2/16",
                "IPv6Address": ""
            },
            "87b87f2a62abeca39babd592ffc02025397e3d7207eea603168475f578c25002": {
                "Name": "scripts_elk_1",
                "EndpointID": "914aee56fe28f9b0182aeea5a6091ca6c0b6e4ef83f6dcf22564e554047ac858",
                "MacAddress": "02:42:ac:13:00:03",
                "IPv4Address": "172.19.0.3/16",
                "IPv6Address": ""
            }
        },
        "Options": {},
        "Labels": {}
    }
]

ping(从用户内部到ELK)

docker exec -it kafka1 ping 172.19.0.3
PING 172.19.0.3 (172.19.0.3) 56(84) bytes of data.
64 bytes from 172.19.0.3: icmp_seq=1 ttl=64 time=0.228 ms
64 bytes from 172.19.0.3: icmp_seq=2 ttl=64 time=0.125 ms
64 bytes from 172.19.0.3: icmp_seq=3 ttl=64 time=0.128 ms
64 bytes from 172.19.0.3: icmp_seq=4 ttl=64 time=0.138 ms
64 bytes from 172.19.0.3: icmp_seq=5 ttl=64 time=0.155 ms
^C
--- 172.19.0.3 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4146ms
rtt min/avg/max/mdev = 0.125/0.154/0.228/0.041 ms

编辑2

我结束了制作一个大型docker-compose文件并将它们放到网络中的情况

version: "3.7"
services:
  elk:
    image: sebp/elk:670
    container_name: elk1
    ports:
      - "5601:5601"
      - "9200:9200"
      - "5044:5044"
    networks:
      - myNet

  kafka:
    image: wce:latest
    container_name: kafka1
    environment:
      - HOST=172.20.0.2
      - PORT=9200
      - INDEX=nutch
      - TOPIC=health
      - INDEXER=elastic.rest
    command:
      tail -F startup
    networks:
      - myNet

networks:
  myNet:

然后我忘记了docker容器相对于彼此而言不是localhost。因此,我将index-writer.xml中的HOST值更改为新的正确IP,并按预期工作。

0 个答案:

没有答案