我在线跟踪示例,将json.gz维基百科转储导入elasticsearch:https://www.elastic.co/blog/loading-wikipedia。
执行以下
后curl -s 'https://'$site'/w/api.php?action=cirrus-mapping-dump&format=json&formatversion=2' |
jq .content |
sed 's/"index_analyzer"/"analyzer"/' |
sed 's/"position_offset_gap"/"position_increment_gap"/' |
curl -XPUT $es/$index/_mapping/page?pretty -d @-
我收到错误:
{
"error" : {
"root_cause" : [
{
"type" : "mapper_parsing_exception",
"reason" : "Unknown Similarity type [arrays] for field [category]"
}
],
"type" : "mapper_parsing_exception",
"reason" : "Unknown Similarity type [arrays] for field [category]"
},
"status" : 400
}
有人有任何想法吗?我无法使用所描述的方法摄取维基百科内容。希望公司至少更新他们的教程页面。
答案 0 :(得分:0)
如果查看uri,变量formatversion=2
表示映射基于弹性2.x.我建议你:
手动下载生产弹性搜索索引的wiki转储。 http://dumps.wikimedia.org/other/cirrussearch/current/
根据您的需要创建映射,更改弹性5.x中不推荐使用的功能。例如:
{
"mappings": {
"page": {
"properties": {
"auxiliary_text": {
"type": "text"
},
"category": {
"type": "text"
},
"coordinates": {
"properties": {
"coord": {
"properties": {
"lat": {
"type": "double"
},
"lon": {
"type": "double"
}
}
},
"country": {
"type": "text"
},
"dim": {
"type": "long"
},
"globe": {
"type": "text"
},
"name": {
"type": "text"
},
"primary": {
"type": "boolean"
},
"region": {
"type": "text"
},
"type": {
"type": "text"
}
}
},
"defaultsort": {
"type": "boolean"
},
"external_link": {
"type": "text"
},
"heading": {
"type": "text"
},
"incoming_links": {
"type": "long"
},
"language": {
"type": "text"
},
"namespace": {
"type": "long"
},
"namespace_text": {
"type": "text"
},
"opening_text": {
"type": "text"
},
"outgoing_link": {
"type": "text"
},
"popularity_score": {
"type": "double"
},
"redirect": {
"properties": {
"namespace": {
"type": "long"
},
"title": {
"type": "text"
}
}
},
"score": {
"type": "double"
},
"source_text": {
"type": "text"
},
"template": {
"type": "text"
},
"text": {
"type": "text"
},
"text_bytes": {
"type": "long"
},
"timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"title": {
"type": "text"
},
"version": {
"type": "long"
},
"version_type": {
"type": "text"
},
"wiki": {
"type": "text"
},
"wikibase_item": {
"type": "text"
}
}
}
}
}
创建索引后 - 在此示例中为enwiki
- 您只需键入:
zcat enwiki-current-cirrussearch-general.json.gz | parallel --pipe -L 2 -N 2000 -j3 'curl -s http://localhost:9200/enwiki/_bulk --data-binary @- > /dev/null'