我在处理字符串数组时遇到性能问题。 基本上,我需要计算文档数组中每个元素出现的次数。
例如:
Doc 1 [.... Facets:[“ Academia”,“ Piscina”,“ Cinema”]]
Doc 2 [.... Facets:[“学术界”,“ Cozinha”,“电影院”]]
Doc 3 [.... Facets:[“ Cooper”,“ Quadra de Futebol”,“ Cozinha”,“ Cinema”]]
所以我的结果将是:
学术界:2
Piscina:1
电影院:3
Cozinha:2
Quadra de Futebol:1
文档样本:
{
"_id" : ObjectId("5bab1d5e2172eda710338c5c"),
"SiteID" : "VR_1038936695_1",
"PriceSale" : 580000.0,
"Title" : "Apartamento a Venda em Salvador, Pituba, 4 dormitórios, 2 suítes,
4 banheiros, 2 vagas",
"Description" : "Apartamento 44 dormitórios (sendo 2 suítes), banheiros, 2
garagens, dependência de empregada, sala integrada à varanda.andar alto, 119
mº. Condomínio com infraestrutura completa: Piscina, quadra poliesportiva,
academia, salão de festas, brinquedoteca, parque infantil, salão de jogos,
playground com bastante área. Localização: Próximo ao Hiper Ideal, escolas,
faculdade, Mini Shopping, etc... <br> <br> OPORTUNIDADE!!! <br> <br> Agende
Sua Visita!!! <br> <br> <br> - Ar Condicionado <br> - Móveis Planejados <br>
- Portão Eletrônico <br> - Área de Serviço <br> - Cozinha <br> - Bares e
Restaurantes <br> - Escola <br> - Farmácia <br> - Shopping Center <br> -
Supermercado",
"Link" : "https://www.vivareal.com.br/imovel/apartamento-4-quartos-pituba-
bairros-salvador-com-garagem-119m2-venda-RS580000-id-1038936695/",
"QtyRoomsMin" : 4.0,
"QtyRoomsMax" : 4.0,
"QtySuitesMin" : 2.0,
"QtySuitesMax" : 2.0,
"QtyParkingSlotMin" : 2.0,
"QtyParkingSlotMax" : 2.0,
"AreaMin" : 119.0,
"AreaMax" : 119.0,
"QtyBathroomsMin" : 4.0,
"QtyBathroomsMax" : 4.0,
"SiteOrigin" : NumberInt(3),
"Type" : NumberInt(1),
"Subtype" : NumberInt(7),
"UpdateDate" : ISODate("2018-10-24T00:00:51.553+0000"),
"SortOrder" : NumberInt(280),
"IdDistrict" : NumberInt(1876),
"DistrictName" : "Pituba",
"IdCity" : NumberInt(988),
"CityName" : "Salvador",
"IdState" : NumberInt(5),
"StateName" : "Bahia",
"UF" : "BA",
"FullAddress" : "Rua Ceará",
"ZipCode" : NumberInt(41830450),
"Latitude" : null,
"Longitude" : null,
"IdTransaction" : NumberInt(1),
"ExpireAt" : ISODate("2018-11-12T23:00:51.553+0000"),
"Facets" : [
"Academia",
"Ar Condicionado",
"Área de Serviço",
"Cozinha",
"Espaço Verde / Parque",
"Piscina",
"Quadra Poliesportiva",
"Salão de jogos",
"Garagem"
]
}
C#中的代码 var pipe = this.Collection.Aggregate(new AggregateOptions { AllowDiskUse = true}) 。匹配(过滤器) 。展开(x => x.Facets) .SortByCount(“ $ Facets”); 列表listFacets = new List(); var output = pipeline.ToList();
MongoDB中的相同查询:
aggregate([
{
"$match": {
"Subtype": {
"$in": [
7
]
},
"IdTransaction": 1,
"IdDistrict": {
"$in": [
25938
]
},
"IdCity": 7994
}
},
{
"$unwind": "$Facets"
},
{
"$sortByCount": "$Facets"
}
])
此查询耗时1070毫秒。 我有一些10774ms的示例,都使用IXScan:(
我的收藏有900万份文档。
这是来自1个查询的探查器的日志。 查询使用IXSCAN,但我读了1篇文章(https://lamada.eu/blog/2016/11/08/troubleshooting-mongodb-queries-performance/),对于一个完美的IXScan,我们需要达到keysExamined = nReturned = docsExamined。
看我的结果,我没有得到最佳的索引
如何改进此查询?
{
"op": "command",
"ns": "SonarImovel.Property",
"command": {
"aggregate": "Property",
"pipeline": [
{
"$match": {
"Subtype": {
"$in": [
13
]
},
"IdTransaction": 1,
"IdDistrict": {
"$in": [
25938
]
},
"IdCity": 7994
}
},
{
"$unwind": "$Facets"
},
{
"$sortByCount": "$Facets"
}
],
"cursor": {
},
"$db": "SonarImovel",
"lsid": {
"id": UUID("6698f309-4f40-4b77-92bb-fc2a8a99efba")
}
},
"keysExamined": 2638,
"docsExamined": 2638,
"hasSortStage": true,
"cursorExhausted": true,
"numYield": 71,
"locks": {
"Global": {
"acquireCount": {
"r": NumberLong(146)
}
},
"Database": {
"acquireCount": {
"r": NumberLong(73)
}
},
"Collection": {
"acquireCount": {
"r": NumberLong(73)
}
}
},
"nreturned": 39,
"responseLength": 1707,
"protocol": "op_msg",
"millis": 1070,
"planSummary": "IXSCAN { Subtype: 1, IdCity: 1, IdTransaction: 1,
IdDistrict: 1, SortOrder: 1 }"
答案 0 :(得分:0)
我更喜欢以与u查询相同的顺序创建索引 子类型,idtransaction,iddistrict,idcity,sortorder
db.SonarImovel.Property.createIndex({Subtype:1,IdTransaction:1,IdDistrict:1,IdCity:1,SortOrder:1})