Question

我们目前正在使用一个名为items的集合，它在MongoDB数据库中包含1000万个条目。

此集合包含两个名为title和country_code的列（以及其他许多列）。一个这样的条目看起来像这样

{
  "_id": ObjectId("566acf868fdd29578f35e8db"),
  "feed": ObjectId("566562f78fdd2933aac85b42"),
  "category": "Mobiles & Tablets",
  "title": "360DSC Crystal Clear Transparent Ultra Slim Shockproof TPU Case for Iphone 5 5S (Transparent Pink)",
  "URL": "http://www.lazada.co.id/60dsc-crystal-clear-transparent-ultra-slim-shockproof-tpu-case-for-iphone-5-5s-transparent-pink-3235992.html",
  "created_at": ISODate("2015-12-11T13:28:38.470Z"),
  "barcode": "36834ELAA1XCWOANID-3563358",
  "updated_at": ISODate("2015-12-11T13:28:38.470Z"),
  "country_code": "ID",
  "picture-url": "http://id-live.slatic.net/p/image-2995323-1-product.jpg",
  "price": "41000.00"
}

列country_code的基数非常高。我们为这些列创建了两个文本索引：

db.items.createIndex({title: "text", country_code: "text"})

在我们的示例中，我们尝试查询：

db.items.find({"title": { "$regex": "iphone", "$options": "i" }, country_code: "US"}).limit(10)

一个大约需要6秒才能完成的查询，对于这种类型的数据库而言似乎异常高。

每当我们尝试结果较少的country_code（例如country_code：“UK”）时，它会在几毫秒内返回结果。

是否有任何特殊原因，为什么这些查询在返回结果方面的差异如此之大？

编辑：这里的所有答案都有帮助，如果您自己有这个问题，请尝试所有3个解决方案。但是只能将1标记为正确。

Answer 1

切换索引中字段的顺序。 订单重要。

db.items.createIndex({country_code: "text", title: "text"})

确保在查询时保持此顺序：

db.items.find({country_code: "US", "title": { "$regex": "iphone", "$options": "i" }}).limit(10)

这样做会大幅减少您需要的title字段数量，以便搜索子字符串。

同样如@Jaco所述，你应该利用你的“文本”索引。请参阅how to query a text index here。

Answer 2

在country_code上进行精确搜索时，您只能在title上添加文字索引：

db.items.createIndex({title:"text"})

并在country_code上添加单独的索引：

db.items.createIndex({country_code:1})

由于您在text上定义了title索引，因此您不必使用正则表达式，而是可以进行如下文本搜索：

db.items.find({$text:{$search:"iphone"},country_code:"US"})

Answer 3

您应该构建一个类似{country_code: 1, title: "text"}的索引。

Equal比正则表达式快得多，算得上。

MongoDB性能问题

3 个答案: