我有一个方案可以从弹性搜索中检索数百万条记录。
我是Elastic-search的初学者,不能非常有效地使用弹性搜索。
我在弹性搜索中索引作者模型,如下所示,并且我正在使用NEST Client在.net应用程序中使用弹性搜索。
下面我要解释我的模型。
Author
--------------------------------
AuthorKey string
List<Study> Nested
Study
---------------------------------
PMID int
PublicationDate date
PublicationType string
MeshTerms string
Content string
我们有将近1000万的作者,每个作者至少完成了3项研究。
因此,弹性索引中大约有3000万条记录可用。
现在我想获得作者的数据及其研究总数
下面是示例JSON数据:
{
"Authors": [
{
"AuthorKey": "Author1",
"AuthorName": "karan",
"AuthorLastName": "shah",
"Study": [
{
"PMId": 1000,
"PublicationDate": "2019-01-17T06:35:52.178Z",
"content": "this is dummy content.how can i solve this",
"MeshTerms": "karan,dharan,nilesh,manan,mehul sir,manoj",
"PublicationType": [
"ClinicalTrial",
"Medical"
]
},
{
"PMId": 1001,
"PublicationDate": "2019-01-16T05:55:14.947Z",
"content": "this is dummy content.how can i solve this",
"MeshTerms": "karan1,dharan1,nilesh1,manan1,mehul1 sir,manoj1",
"PublicationType": [
"ClinicalTrial",
"Medical"
]
},
{
"PMId": 1002,
"PublicationDate": "2019-01-15T05:55:14.947Z",
"content": "this is dummy content for record2.how can i solve
this",
"MeshTerms": "karan2,dharan2,nilesh2,manan2,mehul2 sir,manoj2",
"PublicationType": [
"ClinicalTrial1",
"Medical2"
]
},
{
"PMId": 1003,
"PublicationDate": "2011-01-15T05:55:14.947Z",
"content": "this is dummy content for record3.how can i solve this",
"MeshTerms": "karan3,dharan3,nilesh3,manan3,mehul3 sir,manoj3",
"PublicationType": [
"ClinicalTrial1",
"Medical3"
]
}
]
},
{
"AuthorKey": "Author2",
"AuthorName": "dharan",
"AuthorLastName": "shah",
"Study": [
{
"PMId": 2001,
"PublicationDate": "2011-01-16T05:55:14.947Z",
"content": "this is dummy content for author 2.how can i solve
this",
"MeshTerms": "karan1,dharan1,nilesh1,manan1,mehul1 sir,manoj1",
"PublicationType": [
"ClinicalTrial",
"Medical"
]
},
{
"PMId": 2002,
"PublicationDate": "2019-01-15T05:55:14.947Z",
"content": "this is dummy content for author 2.how can i solve
this",
"MeshTerms": "karan2,dharan2,nilesh2,manan2,mehul2 sir,manoj2",
"PublicationType": [
"ClinicalTrial1",
"Medical2"
]
},
{
"PMId": 2003,
"PublicationDate": "2015-01-15T05:55:14.947Z",
"content": "this is dummy content for record2.how can i solve
this",
"MeshTerms": "karan3,dharan3,nilesh3,manan3,mehul3 sir,manoj3",
"PublicationType": [
"ClinicalTrial1",
"Medical3"
]
}
]
},
{
"AuthorKey": "Author3",
"AuthorName": "Nilesh",
"AuthorLastName": "Mistrey",
"Study": [
{
"PMId": 3000,
"PublicationDate": "2012-01-16T05:55:14.947Z",
"content": "this is dummy content for author 2 .how can i solve
this",
"MeshTerms": "karan2,dharan2,nilesh2,manan2,mehul sir2,manoj2",
"PublicationType": [
"ClinicalTrial",
"Medical"
]
}
]
}
如何检索所有作者及其研究总数从高到低?
预期输出:
{
"Authors": [
{
"AuthorKey": "Author1",
"AuthorName": "karan",
"AuthorLastName": "shah",
"StudyCount": 4
},
{
"AuthorKey": "Author2",
"AuthorName": "dharan",
"AuthorLastName": "shah",
"StudyCount": 3
},
{
"AuthorKey": "Author3",
"AuthorName": "Nilesh",
"AuthorLastName": "Mistrey",
"StudyCount": 1
}
]
}
以下是索引的映射:
{
"authorindex": {
"mappings": {
"_doc": {
"properties": {
"AuthorKey": {
"type": "keyword"
},
"AuthorLastName": {
"type": "keyword"
},
"AuthorName": {
"type": "keyword"
},
"Study": {
"type": "nested",
"properties": {
"MeshTerms": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"PMId": {
"type": "long"
},
"PublicationDate": {
"type": "date"
},
"PublicationType": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"content": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
}
答案 0 :(得分:0)
有两种解决方法。
使用类似脚本的脚本;
预先计算所需的研究次数,将其作为简单整数存储在索引中,然后对结果进行排序。
根据您所面临的情况,这两种选择都可以为您服务。
如果您需要试验数据并进行随意查询,则选项1)将起作用。它性能不佳,但应与现有数据和映射一起使用。
选项2)相反,将需要完全重新索引并在将数据发送到Elasticsearch之前添加一个额外的步骤(至今仍很容易)。从积极的方面来说,这将保证最佳的性能。
您可以在《权威指南》的Handling relationships一章中了解有关Elasticsearch中处理关系的其他方式的信息。
希望有帮助!