Question

我已经浏览了许多关于为MongoDB配置Elasticsearch以索引MongoDB中的集合的博客和网站，但这些都不是直截了当的。

请向我解释一下安装elasticsearch的一步一步的过程，其中应包括：

配置
在浏览器中运行

我正在使用带有express.js的Node.js，所以请相应帮助。

Answer 1

这个答案应该足以让您在Building a functional search component with MongoDB, Elasticsearch, and AngularJS上设置为遵循本教程。

如果您希望对API中的数据使用分面搜索，那么您可能需要查看Matthiasn的BirdWatch Repo。

因此，您可以设置单个节点Elasticsearch“cluster”来索引MongoDB，以便在新的EC2 Ubuntu 14.04实例上的NodeJS，Express应用程序中使用。

确保所有内容都是最新的。

sudo apt-get update

安装NodeJS。

sudo apt-get install nodejs
sudo apt-get install npm

Install MongoDB - 这些步骤直接来自MongoDB文档。选择您喜欢的任何版本。我坚持使用v2.4.9，因为它似乎是最新版本MongoDB-River支持而没有问题。

导入MongoDB公共GPG密钥。

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10

更新您的来源列表。

echo 'deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen' | sudo tee /etc/apt/sources.list.d/mongodb.list

获取10gen包。

sudo apt-get install mongodb-10gen

如果您不想要最新版本，请选择您的版本。如果您在Windows 7或8计算机上设置环境，请远离v2.6，直到他们将其作为服务运行时出现一些错误。

apt-get install mongodb-10gen=2.4.9

更新时，防止MongoDB安装版本出现问题。

echo "mongodb-10gen hold" | sudo dpkg --set-selections

启动MongoDB服务。

sudo service mongodb start

您的数据库文件默认为/ var / lib / mongo，日志文件默认为/ var / log / mongo。

通过mongo shell创建一个数据库并将一些虚拟数据推入其中。

mongo YOUR_DATABASE_NAME
db.createCollection(YOUR_COLLECTION_NAME)
for (var i = 1; i <= 25; i++) db.YOUR_COLLECTION_NAME.insert( { x : i } )

现在到Convert the standalone MongoDB into a Replica Set。

首先关闭这个过程。

mongo YOUR_DATABASE_NAME
use admin
db.shutdownServer()

现在我们将MongoDB作为服务运行，因此当我们重新启动mongod进程时，我们不会在命令行参数中传入“--replSet rs0”选项。相反，我们将它放在mongod.conf文件中。

vi /etc/mongod.conf

添加这些行，为您的数据库和日志路径进行修补。

replSet=rs0
dbpath=YOUR_PATH_TO_DATA/DB
logpath=YOUR_PATH_TO_LOG/MONGO.LOG

现在再次打开mongo shell以初始化副本集。

mongo DATABASE_NAME
config = { "_id" : "rs0", "members" : [ { "_id" : 0, "host" : "127.0.0.1:27017" } ] }
rs.initiate(config)
rs.slaveOk() // allows read operations to run on secondary members.

现在安装Elasticsearch。我只是关注这个有用的Gist。

确保已安装Java。

sudo apt-get install openjdk-7-jre-headless -y

现在坚持使用v1.1.x，直到在v1.2.1中修复Mongo-River插件错误。

wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.1.1.deb
sudo dpkg -i elasticsearch-1.1.1.deb

curl -L http://github.com/elasticsearch/elasticsearch-servicewrapper/tarball/master | tar -xz
sudo mv *servicewrapper*/service /usr/local/share/elasticsearch/bin/
sudo rm -Rf *servicewrapper*
sudo /usr/local/share/elasticsearch/bin/service/elasticsearch install
sudo ln -s `readlink -f /usr/local/share/elasticsearch/bin/service/elasticsearch` /usr/local/bin/rcelasticsearch

如果您目前仅在单个节点上进行开发，请确保/etc/elasticsearch/elasticsearch.yml启用了以下配置选项：

cluster.name: "MY_CLUSTER_NAME"
node.local: true

启动Elasticsearch服务。

sudo service elasticsearch start

验证它是否正常工作。

curl http://localhost:9200

如果你看到这样的话，那你很好。

{
  "status" : 200,
  "name" : "Chi Demon",
  "version" : {
    "number" : "1.1.2",
    "build_hash" : "e511f7b28b77c4d99175905fac65bffbf4c80cf7",
    "build_timestamp" : "2014-05-22T12:27:39Z",
    "build_snapshot" : false,
    "lucene_version" : "4.7"
  },
  "tagline" : "You Know, for Search"
}

现在安装Elasticsearch插件，以便它可以与MongoDB一起使用。

bin/plugin --install com.github.richardwilly98.elasticsearch/elasticsearch-river-mongodb/1.6.0
bin/plugin --install elasticsearch/elasticsearch-mapper-attachments/1.6.0

这两个插件不是必需的，但它们适用于测试查询和可视化索引的更改。

bin/plugin --install mobz/elasticsearch-head
bin/plugin --install lukas-vlcek/bigdesk

重新启动Elasticsearch。

sudo service elasticsearch restart

最后索引来自MongoDB的集合。

curl -XPUT localhost:9200/_river/DATABASE_NAME/_meta -d '{
  "type": "mongodb",
  "mongodb": {
    "servers": [
      { "host": "127.0.0.1", "port": 27017 }
    ],
    "db": "DATABASE_NAME",
    "collection": "ACTUAL_COLLECTION_NAME",
    "options": { "secondary_read_preference": true },
    "gridfs": false
  },
  "index": {
    "name": "ARBITRARY INDEX NAME",
    "type": "ARBITRARY TYPE NAME"
  }
}'

检查您的索引是否在Elasticsearch

中

curl -XGET http://localhost:9200/_aliases

检查群集运行状况。

curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'

它可能是黄色的，带有一些未分配的碎片。我们必须告诉Elasticsearch我们想要使用什么。

curl -XPUT 'localhost:9200/_settings' -d '{ "index" : { "number_of_replicas" : 0 } }'

再次检查群集运行状况。现在它应该是绿色的。

curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'

去玩。

Answer 2

使用river可能会在您的操作扩展时出现问题。在繁重的操作中，河将使用大量的内存。我建议您实施自己的弹性搜索模型，或者如果您使用mongoose，您可以将弹性搜索模型直接构建到该模型中，或者使用基本上为您执行此操作的mongoosastic。

Mongodb River的另一个缺点是你将使用mongodb 2.4.x分支和ElasticSearch 0.90.x卡住。你会开始发现你错过了许多非常好的功能，而mongodb河项目并没有足够快地产生可用的产品以保持稳定。那说Mongodb River绝对不是我投入生产的东西。它带来的问题多于其价值。它将在重负载下随机丢弃写入，它将消耗大量内存，并且没有设置限制它。另外，河流不会实时更新，它从mongodb读取oplogs，这可以延迟更新长达5分钟的经验。

我们最近不得不重写我们项目的大部分内容，因为它每周都会出现ElasticSearch出现问题。我们甚至没有聘请Dev Ops顾问，他也同意最好离开河。

<强>更新 Elasticsearch-mongodb-river现在支持ES v1.4.0和mongodb v2.6.x.但是，您仍然可能在重度插入/更新操作时遇到性能问题，因为此插件将尝试读取mongodb的oplog以进行同步。如果锁定（或锁定）解锁后有很多操作，您会注意到elasticsearch服务器上的内存使用率非常高。如果您计划进行大规模的操作，河流不是一个好选择。 ElasticSearch的开发人员仍然建议您使用您的语言的客户端库直接与他们的API通信来管理您自己的索引，而不是使用river。这不是河流的真正目的。 Twitter-river是如何使用河流的一个很好的例子。它本质上是一种从外部来源获取数据的好方法，但对于高流量或内部使用来说不是很可靠。

同时考虑到mongodb-river落后于版本，因为它不是由ElasticSearch Organization维护，由第三方维护。在v1.0发布之后，开发在v0.90分支上停留了很长时间，当v1.0的版本发布时，它在弹性搜索发布v1.3.0之前一直不稳定。 Mongodb版本也落后了。当您希望转移到每个版本的更高版本时，您可能会发现自己处于紧张的位置，尤其是ElasticSearch正处于如此繁重的开发阶段，并且在途中会有许多非常期待的功能。保持最新的ElasticSearch非常重要，因为我们非常依赖于不断改进我们的搜索功能，因为它是我们产品的核心部分。

总而言之，如果你自己做的话，你可能会得到更好的产品。它并不难。它只是在您的代码中管理的另一个数据库，它可以轻松地放入现有模型中而无需重大修改。

Answer 3

我发现mongo-connector非常有用。它是Mongo Labs（MongoDB Inc.）的形式，现在可以与Elasticsearch 2.x一起使用

Elastic 2.x doc manager：https://github.com/mongodb-labs/elastic2-doc-manager

mongo-connector创建从MongoDB集群到一个或多个目标系统的管道，例如Solr，Elasticsearch或其他MongoDB集群。它将MongoDB中的数据同步到目标，然后关闭MongoDB oplog，实时跟上MongoDB中的操作。它已经过Python 2.6,2.7和3.3+的测试。维基上提供了详细的文档。

https://github.com/mongodb-labs/mongo-connector https://github.com/mongodb-labs/mongo-connector/wiki/Usage%20with%20ElasticSearch

Answer 4

这里如何在mongodb 3.0上执行此操作。我使用了这个不错的blog

安装mongodb。
创建数据目录：

$ mkdir RANDOM_PATH/node1
$ mkdir RANDOM_PATH/node2> 
$ mkdir RANDOM_PATH/node3

启动Mongod实例

$ mongod --replSet test --port 27021 --dbpath node1
$ mongod --replSet test --port 27022 --dbpath node2
$ mongod --replSet test --port 27023 --dbpath node3

配置副本集：

$ mongo
config = {_id: 'test', members: [ {_id: 0, host: 'localhost:27021'}, {_id: 1, host: 'localhost:27022'}]};    
rs.initiate(config);

安装Elasticsearch：

a. Download and unzip the [latest Elasticsearch][2] distribution

b. Run bin/elasticsearch to start the es server.

c. Run curl -XGET http://localhost:9200/ to confirm it is working.

安装和配置MongoDB River：

$ bin / plugin --install   com.github.richardwilly98.elasticsearch / elasticsearch江-mongodb的

$ bin / plugin --install elasticsearch / elasticsearch-mapper-attachments

创建“River”和索引：

curl -XPUT＆＃39; http://localhost:8080/_river/mongodb/_meta＆＃39; -d＆＃39; { ＆＃34;输入＆＃34;：＆＃34; mongodb＆＃34;，＆＃34; mongodb＆＃34;：{ ＆＃34; db＆＃34;：＆＃34; mydb＆＃34;，＆＃34;收集＆＃34;：＆＃34; foo＆＃34; }，＆＃34; index＆＃34;：{ ＆＃34; name＆＃34;：＆＃34; name＆＃34;，＆＃34;输入＆＃34;：＆＃34;随机＆＃34; } }＆＃39;

在浏览器上测试：

http://localhost:9200/_search?q=home

Answer 5

一旦您希望获得几乎实时的同步和一般解决方案，River就是一个很好的解决方案。

如果您已经拥有MongoDB中的数据，并希望将其轻松发送到Elasticsearch，例如＆＃34; one-shot＆＃34;你可以在Node.js https://github.com/itemsapi/elasticbulk中试用我的包。

它使用Node.js流，因此您可以从支持流的所有内容（即MongoDB，PostgreSQL，MySQL，JSON文件等）导入数据

MongoDB到Elasticsearch的示例：

安装包：

    if (!string.IsNullOrWhiteSpace(filter.Keywords))
    {
        var keywords = filter.Keywords.ToUpper().Split(' ');

        foreach (var keyword in keywords)
            query = query.Where(o => (o.Data.General.Dossier.ToUpper()
                                      + o.Data.General.OrderId.ToUpper()
                                      + (o.Team.Name ?? "").ToUpper()
                                      + (o.Data.General.MaintenancePlant ?? "").ToUpper()   
                                      + (o.Data.Location.BoxNumber ?? "").ToUpper()
                                      + (o.Data.Location.City ?? "").ToUpper()
                                      + (o.Data.Location.HouseNumber ?? "").ToUpper()
                                      //+ o.Data.Location.Zip.ToUpper()
                                      + (o.Data.Location.Street ?? "").ToUpper())
                .Contains(keyword)  );

    }

创建脚本，即script.js：

npm install elasticbulk
npm install mongoose
npm install bluebird

发送您的数据：

const elasticbulk = require('elasticbulk');
const mongoose = require('mongoose');
const Promise = require('bluebird');
mongoose.connect('mongodb://localhost/your_database_name', {
  useMongoClient: true
});

mongoose.Promise = Promise;

var Page = mongoose.model('Page', new mongoose.Schema({
  title: String,
  categories: Array
}), 'your_collection_name');

// stream query 
var stream = Page.find({
}, {title: 1, _id: 0, categories: 1}).limit(1500000).skip(0).batchSize(500).stream();

elasticbulk.import(stream, {
  index: 'my_index_name',
  type: 'my_type_name',
  host: 'localhost:9200',
})
.then(function(res) {
  console.log('Importing finished');
})

它并不是非常快，但它可以为数百万条记录工作（感谢溪流）。

Answer 6

由于mongo-connector现在已经死了，我的公司决定构建一个使用Mongo更改流输出到Elasticsearch的工具。

我们的初步结果看起来很有希望。您可以在https://github.com/everyone-counts/mongo-stream查看。我们还处于发展初期，欢迎提出建议或贡献。

Answer 7

在这里，我找到了另一个很好的选择，可以将您的MongoDB数据迁移到Elasticsearch。一个go守护程序，可将mongodb实时同步到elasticsearch。它是Monstache。可通过以下网址获得：Monstache

在初始setp下方进行配置和使用。

步骤1：

C:\Program Files\MongoDB\Server\4.0\bin>mongod --smallfiles --oplogSize 50 --replSet test

第2步：

C:\Program Files\MongoDB\Server\4.0\bin>mongo

C:\Program Files\MongoDB\Server\4.0\bin>mongo
MongoDB shell version v4.0.2
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 4.0.2
Server has startup warnings:
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten]
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten]
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten] ** WARNING: This server is bound to localhost.
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten] **          Remote systems will be unable to connect to this server.
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten] **          Start the server with --bind_ip <address> to specify which IP
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten] **          addresses it should serve responses from, or with --bind_ip_all to
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten] **          bind to all interfaces. If this behavior is desired, start the
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten] **          server with --bind_ip 127.0.0.1 to disable this warning.
2019-01-18T16:56:44.931+0530 I CONTROL  [initandlisten]
MongoDB Enterprise test:PRIMARY>

第3步：验证复制。

MongoDB Enterprise test:PRIMARY> rs.status();
{
        "set" : "test",
        "date" : ISODate("2019-01-18T11:39:00.380Z"),
        "myState" : 1,
        "term" : NumberLong(2),
        "syncingTo" : "",
        "syncSourceHost" : "",
        "syncSourceId" : -1,
        "heartbeatIntervalMillis" : NumberLong(2000),
        "optimes" : {
                "lastCommittedOpTime" : {
                        "ts" : Timestamp(1547811537, 1),
                        "t" : NumberLong(2)
                },
                "readConcernMajorityOpTime" : {
                        "ts" : Timestamp(1547811537, 1),
                        "t" : NumberLong(2)
                },
                "appliedOpTime" : {
                        "ts" : Timestamp(1547811537, 1),
                        "t" : NumberLong(2)
                },
                "durableOpTime" : {
                        "ts" : Timestamp(1547811537, 1),
                        "t" : NumberLong(2)
                }
        },
        "lastStableCheckpointTimestamp" : Timestamp(1547811517, 1),
        "members" : [
                {
                        "_id" : 0,
                        "name" : "localhost:27017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 736,
                        "optime" : {
                                "ts" : Timestamp(1547811537, 1),
                                "t" : NumberLong(2)
                        },
                        "optimeDate" : ISODate("2019-01-18T11:38:57Z"),
                        "syncingTo" : "",
                        "syncSourceHost" : "",
                        "syncSourceId" : -1,
                        "infoMessage" : "",
                        "electionTime" : Timestamp(1547810805, 1),
                        "electionDate" : ISODate("2019-01-18T11:26:45Z"),
                        "configVersion" : 1,
                        "self" : true,
                        "lastHeartbeatMessage" : ""
                }
        ],
        "ok" : 1,
        "operationTime" : Timestamp(1547811537, 1),
        "$clusterTime" : {
                "clusterTime" : Timestamp(1547811537, 1),
                "signature" : {
                        "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                        "keyId" : NumberLong(0)
                }
        }
}
MongoDB Enterprise test:PRIMARY>

第4步。下载“ https://github.com/rwynn/monstache/releases”。解压缩下载文件，并调整PATH变量以包含平台文件夹的路径。转到cmd并键入"monstache -v" ＃4.13.1 Monstache使用TOML格式进行配置。配置用于迁移的文件config.toml

第5步。

我的config.toml->

mongo-url = "mongodb://127.0.0.1:27017/?replicaSet=test"
elasticsearch-urls = ["http://localhost:9200"]

direct-read-namespaces = [ "admin.users" ]

gzip = true
stats = true
index-stats = true

elasticsearch-max-conns = 4
elasticsearch-max-seconds = 5
elasticsearch-max-bytes = 8000000 

dropped-collections = false
dropped-databases = false

resume = true
resume-write-unsafe = true
resume-name = "default"
index-files = false
file-highlighting = false
verbose = true
exit-after-direct-reads = false

index-as-update=true
index-oplog-time=true

第6步。

D:\15-1-19>monstache -f config.toml

如何在MongoDB中使用Elasticsearch？

7 个答案: