MongoDb文档未在弹性搜索中完全索引

时间:2014-05-26 20:08:45

标签: mongodb elasticsearch

我创建了一个包含文档的MongoDb集合,下面是一个文档示例。

{
    "_id": ObjectId("53837eed557acd39628b4cdf"),
    "userid": null,
    "importdate": ISODate("2014-05-26T17:50:37.0Z"),
    "documentnumber": "174953-2014",
    "source": "ted",
    "typeoftender": "public",
    "categories": {
        "0": ObjectId("527baa62557acd1669eb992d") 
    },
    "data": {
        "oj": "100",
        "ol": "bg",
        "cy": "bg",
        "dt": ISODate("2014-06-30T22:00:00.0Z"),
        "heading": "01302",
        "ti": {
            "bg": "Услуги по програмиране на системен софтуер и потребителски софтуерни средства",
            "cs": "Programování systémového a uživatelského programového vybavení",
            "da": "Programmeringsservice i forbindelse med systemer og brugerprogrammel",
            "de": "Programmierung von System- und Anwendersoftware",
            "el": "Υπηρεσίες προγραμματισμού λογισμικών συστήματος και χρήστη",
            "en": "Programming services of systems and user software",
            "es": "Servicios de programación de sistemas y software de usuario",
            "et": "Süsteemide ja kasutajatarkvara programmeerimine",
            "fi": "Varus- ja käyttäjäohjelmiston ohjelmointipalvelut",
            "fr": "Services de programmation de systèmes et de logiciels utilitaires",
            "ga": "Programming services of systems and user software",
            "hr": "Usluge programiranja sustava i korisničke podrške",
            "hu": "Rendszer- és felhasználói szoftverek programozási szolgáltatásai",
            "it": "Servizi di programmazione di software di sistemi e di utente",
            "lt": "Programavimo paslaugos, susijusios su sistemomis ir vartotojo programine įranga",
            "lv": "Sistēmu un lietotāju programmatūras programmēšanas pakalpojumi",
            "mt": "Servizzi ta' programmizzar tas-sistemi u tas-software ta' l-utenti",
            "nl": "Programmering van systeem- en gebruikerssoftware",
            "pl": "Usługi programowania oprogramowania systemowego i dla użytkownika",
            "pt": "Serviços de programação de sistemas e de software para o utilizador",
            "ro": "Servicii de programare de sisteme informatice şi software utilitare",
            "sk": "Programovanie systémového a používateľského softvéru",
            "sl": "Storitve programiranja sistemske in uporabniške programske opreme",
            "sv": "Programmering av system- och användarprogram" 
        },
        "tw": {
            "bg": "София",
            "cs": "Sofie",
            "da": "Sofia",
            "de": "Sofia",
            "el": "Σόφια",
            "en": "Sofia",
            "es": "Sofía",
            "et": "Sofia",
            "fi": "Sofia",
            "fr": "Sofia",
            "ga": "Sóifia",
            "hr": "Sofija",
            "hu": "Szófia",
            "it": "Sofia",
            "lt": "Sofija",
            "lv": "Sofija",
            "mt": "Sofija",
            "nl": "Sofia",
            "pl": "Sofia",
            "pt": "Sófia",
            "ro": "Sofia",
            "sk": "Sofia",
            "sl": "Sofija",
            "sv": "Sofia" 
        },
        "rc": "BG411",
        "cpv": {
            "0": "72211000" 
        } 
    },
    "document": {
        "da": "<p>Direktiv 2004\/18\/EF<\/p><div class=\"grseq\"><p class=\"tigrseq\">Del I: Ordregivende myndighed<\/p><div class=\"mlioccur\"><span class=\"nomark\" style=\"col [...]",
        "de": "<p>Richtlinie 2004\/18\/EG<\/p><div class=\"grseq\"><p class=\"tigrseq\">Abschnitt I: Öffentlicher Auftraggeber<\/p><div class=\"mlioccur\"><span class=\"nomark\" [...]",
        "en": "<p>Directive 2004\/18\/EC<\/p><div class=\"grseq\"><p class=\"tigrseq\">Section I: Contracting authority<\/p><div class=\"mlioccur\"><span class=\"nomark\" style= [...]",
        "es": "<p>Directiva 2004\/18\/CE<\/p><div class=\"grseq\"><p class=\"tigrseq\">Apartado I: Poder adjudicador<\/p><div class=\"mlioccur\"><span class=\"nomark\" style=\"co [...]",
        "fi": "<p>Direktiivi 2004\/18\/EY<\/p><div class=\"grseq\"><p class=\"tigrseq\">I kohta: Hankintaviranomainen<\/p><div class=\"mlioccur\"><span class=\"nomark\" style=\"c [...]",
        "fr": "<p>Directive 2004\/18\/CE<\/p><div class=\"grseq\"><p class=\"tigrseq\">Section I: Pouvoir adjudicateur<\/p><div class=\"mlioccur\"><span class=\"nomark\" style=\" [...]",
        "el": "<p>Οδηγία 2004\/18\/ΕΚ<\/p><div class=\"grseq\"><p class=\"tigrseq\">Τμήμα I: Αναθέτουσα αρχή<\/p><div class=\"mlioccur\"><span class=\"nomark\" style=\"color:blac [...]",
        "it": "<p>Direttiva 2004\/18\/CE<\/p><div class=\"grseq\"><p class=\"tigrseq\">Sezione I: Amministrazione aggiudicatrice<\/p><div class=\"mlioccur\"><span class=\"nomar [...]",
        "nl": "<p>Richtlijn 2004\/18\/EG<\/p><div class=\"grseq\"><p class=\"tigrseq\">Afdeling I: Aanbestedende dienst<\/p><div class=\"mlioccur\"><span class=\"nomark\" style= [...]",
        "pt": "<p>Directiva 2004\/18\/CE<\/p><div class=\"grseq\"><p class=\"tigrseq\">Secção I: Autoridade adjudicante<\/p><div class=\"mlioccur\"><span class=\"nomark\" style= [...]",
        "sv": "<p>Direktiv 2004\/18\/EG<\/p><div class=\"grseq\"><p class=\"tigrseq\">Avsnitt I: Upphandlande myndighet<\/p><div class=\"mlioccur\"><span class=\"nomark\" style= [...]",
        "cs": "<p>Směrnice 2004\/18\/ES<\/p><div class=\"grseq\"><p class=\"tigrseq\">Oddíl I: Veřejný zadavatel<\/p><div class=\"mlioccur\"><span class=\"nomark\" style=\"color: [...]",
        "et": "<p>Direktiiv 2004\/18\/EÜ<\/p><div class=\"grseq\"><p class=\"tigrseq\">I osa: Hankija<\/p><div class=\"mlioccur\"><span class=\"nomark\" style=\"color:black\">I.1) [...]",
        "hu": "<p>2004\/18\/EK irányelv<\/p><div class=\"grseq\"><p class=\"tigrseq\">I. szakasz: Ajánlatkérő<\/p><div class=\"mlioccur\"><span class=\"nomark\" style=\"color:bla [...]",
        "lt": "<p>Direktyva 2004\/18\/EB<\/p><div class=\"grseq\"><p class=\"tigrseq\">I dalis: Perkančioji organizacija<\/p><div class=\"mlioccur\"><span class=\"nomark\" style [...]",
        "lv": "<p>Direktīva 2004\/18\/EK<\/p><div class=\"grseq\"><p class=\"tigrseq\">I iedaļa: Līgumslēdzēja iestāde<\/p><div class=\"mlioccur\"><span class=\"nomark\" style=\" [...]",
        "mt": "<p>Direttiva 2004\/18\/KE<\/p><div class=\"grseq\"><p class=\"tigrseq\">Taqsima I: Awtorità kontraenti<\/p><div class=\"mlioccur\"><span class=\"nomark\" style=\"c [...]",
        "pl": "<p>Dyrektywa 2004\/18\/WE<\/p><div class=\"grseq\"><p class=\"tigrseq\">Sekcja I: Instytucja zamawiająca<\/p><div class=\"mlioccur\"><span class=\"nomark\" style= [...]",
        "sk": "<p>Smernica 2004\/18\/ES<\/p><div class=\"grseq\"><p class=\"tigrseq\">Oddiel I: Verejný obstarávateľ<\/p><div class=\"mlioccur\"><span class=\"nomark\" style=\"co [...]",
        "sl": "<p>Direktiva 2004\/18\/ES<\/p><div class=\"grseq\"><p class=\"tigrseq\">Oddelek I: Naročnik<\/p><div class=\"mlioccur\"><span class=\"nomark\" style=\"color:black\" [...]",
        "ga": "<p>Treoir 2004\/18\/CE<\/p><div class=\"grseq\"><p class=\"tigrseq\">Alt I: Údarás conarthachta<\/p><div class=\"mlioccur\"><span class=\"nomark\" style=\"color:bl [...]",
        "bg": "<p>Директива 2004\/18\/ЕО<\/p><div class=\"grseq\"><p class=\"tigrseq\">Раздел І: Възлагащ орган<\/p><div class=\"mlioccur\"><span class=\"nomark\" style=\"color:b [...]",
        "ro": "<p>Directiva 2004\/18\/CE<\/p><div class=\"grseq\"><p class=\"tigrseq\">Secțiunea I: Autoritatea contractantă<\/p><div class=\"mlioccur\"><span class=\"nomark\" s [...]",
        "hr": "<p>Direktiva 2004\/18\/EZ<\/p><div class=\"grseq\"><p class=\"tigrseq\">Odjeljak I.: Javni naručitelj<\/p><div class=\"mlioccur\"><span class=\"nomark\" style=\"co [...]" 
      }
}

一旦弹性搜索完成索引,它只存储

{
        _index: tendersidx
        _type: page
        _id: 53837eec557acd39628b4c2b
        _score: 1
        _source: {
            document: {
                da: <p>Direktiv 2004/18/EF</p><div class="grseq"><p class="tigrseq">Del I: Ordregivende myndighed</p><div class="mlioccur"><span class="nomark" style="color:black">I.1)</span><span class="timark" style="font-weight:bold;color:black;">Navn, adresser og kontaktpunkt(er)</span><div class="txtmark" style="color:black"><p><p class="addr">Turun kaupunki<br><br>Linnankatu 55 K, 2 krs. / PL 630<br>20101<br>TurkuFINLAND<br>+358 449075222<br>karolus.haarte@turku.fi</p></p></p><p><p class="ft"><b>Bud eller ansøgninger om deltagelse skal sendes til:</b></p><p class="addr">Turun kaupunki<br><br>https://tarjouspalvelu.fi/turku/?id=17775&tpk=93d33e8c-86aa-40c6-8c60-16d511c61a9a<br></p></p></div></div></span></div><div class="grseq"><p class="tigrseq">Del II: Kontraktens genstand</p><div class="mlioccur"><span class="nomark" style="color:black">II.1)</span><span class="timark" style="font-weight:bold;color:black;">Beskrivelse</span></div></span><div class="mlioccur"><span class="nomark" style="color:black">II.1.6)</span><span class="timark" style="font-weight:bold;color:black;">CPV-glossaret (common procurement vocabulary)</span><div class="txtmark" style="color:black"><p>85000000</p></div></div></span><div class="mlioccur"><span class="nomark" style="color:black"></span><span class="timark" style="font-weight:bold;color:black;">Beskrivelse</span><div class="txtmark" style="color:black"><p>Sundhedsvæsen og sociale foranstaltninger.</p></div></div></span></div><div class="grseq"><p class="tigrseq">Del IV: Procedure</p><div class="mlioccur"><span class="nomark" style="color:black">IV.3)</span><span class="timark" style="font-weight:bold;color:black;">Administrative oplysninger</span></div></span><div class="mlioccur"><span class="nomark" style="color:black">IV.3.3)</span><span class="timark" style="font-weight:bold;color:black;">Vilkår for adgang til specifikationer og yderligere dokumenter eller beskrivende dokumenter</span></div></span><div class="mlioccur"><span class="nomark" style="color:black">IV.3.4)</span><span class="timark" style="font-weight:bold;color:black;">Frist for modtagelse af bud eller ansøgninger om deltagelse</span><div class="txtmark" style="color:black"><p>11.8.2014 - 14:00</p></div></div></span><div class="mlioccur"><span class="nomark" style="color:black">IV.3.6)</span><span class="timark" style="font-weight:bold;color:black;">Sprog, der må benyttes ved afgivelse af bud eller ansøgninger om deltagelse</span><div class="txtmark" style="color:black"><p>finsk.</p></div></div></span></div>
                }
        source: ted
        _id: 53837eec557acd39628b4c2b
        documentnumber: 175084-2014
        importdate: 2014-05-26T17:50:36.000Z
        data: {
            dt: 2014-08-10T22:00:00.000Z
        cpv: [
            85000000
        ]
        cy: fi
        td: 3
        rc: FI183
        ti: {
            sl: Storitve na področju zdravstva in socialnega varstva
            hr: Usluge u području zdravstva i socijalne skrbi
            sk: Zdravotnícka a sociálna pomoc
            ro: Servicii de sănătate şi servicii de asistenţă socială
            da: Sundhedsvæsen og sociale foranstaltninger
            it: Servizi sanitari e di assistenza sociale
            mt: Servizzi dwar saħħa ta' xogħol soċjali
            hu: Egészségügyi és szociális gondozási szolgáltatások
            lv: Veselības un sociālie pakalpojumi
            lt: Sveikatos priežiūros ir socialinio darbo paslaugos
            ga: Health and social work services
            cs: Zdravotní a sociální péče
            de: Dienstleistungen des Gesundheits- und Sozialwesens
            el: Υγειονομικές και κοινωνικές υπηρεσίες
            fi: Terveyspalvelut ja sosiaalitoimen palvelut
            pt: Serviços de saúde e acção social
            pl: Usługi w zakresie zdrowia i opieki społecznej
            sv: Hälso- och sjukvård samt socialvård
            bg: Услуги на здравеопазването и социалните дейности
            fr: Services de santé et services sociaux
            en: Health and social work services
            et: Tervishoiu ja sotsiaaltöö teenused
            es: Servicios de salud y asistencia social
            nl: Gezondheidszorg en maatschappelijk werk
        }
        ty: 1
        nc: 4
        tw: {
            sl: Turku
            hr: Turku
            sk: Turku
            ro: Turku
            da: Turku
            it: Turku
            mt: Turku
            hu: Turku
            lv: Turku
            lt: Turku
            ga: Turku
            cs: Turku
            de: Turku
            el: Turku
            fi: Turku
            pt: Turku
            pl: Turku
            sv: Åbo
            bg: Турку
            fr: Turku
            en: Turku
            et: Turu
            es: Turku
            nl: Turku
        }
        ol: fi
        oj: 100
        ds: 0.00000000 1400623200
        pr: 1
        heading: 01302
        }
        userid: null
        categories: false
        typeoftender: public
        }
    }

正如您所看到的,elasticssearch只索引了“document”的一部分,即“da”元素。

使用以下命令创建索引:

curl -XPUT "localhost:9200/_river/tenders/_meta" -d '
{
    "type": "mongodb",
    "mongodb": {
        "servers": [
            { "host": "127.0.0.1", "port": 27017 }
        ],
        "options": { "secondary_read_preference": true },
        "db": "tenderdb",
        "collection": "tenders"
    },
    "index": {
        "name": "tendersidx",
        "type": "page"
    }
}'

使数据库插入工作的过程是: 1)从服务器下载数据 2)提取从服务器下载的数据 3)将数据插入MongoDB集合 4)从服务器下载元数据(此部分包含“文档”信息) 5)提取下载的元数据 6)将提取的元数据插入MongoDB集合。元数据存储在各种文件中,每种语言都有自己的文件。 “da” - 丹麦语是第一个插入的文件。

MongoDb:2.6.1

ElasticSearch:1.1.0

插件:

elasticsearch-mapper-attachments version 2.0.0 elasticsearch-river-mongodb version 2.0.0

任何人都知道为什么除了“da”之外的mongo“document”中的其他条目在eleasticsearch数据集中不可用?

1 个答案:

答案 0 :(得分:0)

Elastic Search和Java内存不足。增加分配给Java的内存量为我解决了这个问题。