可怜的MongoDB读取性能

时间:2014-05-23 07:27:38

标签: java performance mongodb

我有一个包含航班信息的分片收藏。架构看起来像:

 
{

    "_id" : ObjectId("537ef1bb5516dd401b5b109a"),
    "departureAirport" : "HAJ",
    "arrivalAirport" : "AYT",
    "departureDate" : NumberLong("1412553600000"),
    "operatingAirlineCode" : "DE",
    "operatingFlightNumber" : "1808",
    "flightClass" : "P",
    "fareType" : "EX",
    "availability" : "*"
}

以下是我的收藏品的统计数据:

{

    "sharded" : true,
    "systemFlags" : 1,
    "userFlags" : 1,
    "ns" : "flights.flight",
    "count" : 2809822,
    "numExtents" : 30,
    "size" : 674357280,
    "storageSize" : 921788416,
    "totalIndexSize" : 287746144,
    "indexSizes" : {
        "_id_" : 103499984,"departureAirport_1_arrivalAirport_1_departureDate_1_flightClass_1_availability_1_fareType_1" : 184246160
    },
    "avgObjSize" : 240,
    "nindexes" : 2,
    "nchunks" : 869,
    "shards" : {
        "shard0000" : {
            "ns" : "flights.flight",
            "count" : 1396165,
            "size" : 335079600,
            "avgObjSize" : 240,
            "storageSize" : 460894208,
            "numExtents" : 15,
            "nindexes" : 2,
            "lastExtentSize" : 124993536,
            "paddingFactor" : 1,
            "systemFlags" : 1,
            "userFlags" : 1,
            "totalIndexSize" : 144633440,
            "indexSizes" : {
                "_id_" : 53094944,"departureAirport_1_arrivalAirport_1_departureDate_1_flightClass_1_availability_1_fareType_1" : 91538496
            },
            "ok" : 1
         },
        "shard0001" : {
            "ns" : "flights.flight",
            "count" : 1413657,
            "size" : 339277680,
            "avgObjSize" : 240,
            "storageSize" : 460894208,
            "numExtents" : 15,
            "nindexes" : 2,
            "lastExtentSize" : 124993536,
            "paddingFactor" : 1,
            "systemFlags" : 1,
            "userFlags" : 1,
            "totalIndexSize" : 143112704,
            "indexSizes" : {
                "_id_" : 50405040,"departureAirport_1_arrivalAirport_1_departureDate_1_flightClass_1_availability_1_fareType_1" : 92707664
            },
            "ok" : 1
        }
    },
    "ok" : 1
}

我现在运行JAVA的查询,如下所示:

{
    "departureAirport" : "BSL",
    "arrivalAirport" : "SMF",
    "departureDate" : { 
        "$gte" : 1402617600000,
        "$lte" : 1403136000000
    },
    "flightClass" : "C",
    "$or" : [ 
        { "availability" : { "$gte" : "3"}},
        { "availability" : "*"}
    ] , 
    "fareType" : "OW"
}

出发日期应介于一周的范围内,且可用性应大于所要求的数字或' *'。

我的问题是如何才能提高我的表现。当我使用每个主机50个连接查询数据库时,我只获得大约1000个操作/秒,但我需要获得大约3000 - 5000个操作/秒的内容。

当我在shell中运行查询时,光标看起来没问题:

"光标" :" BtreeCursor departureAirport_1_arrivalAirport_1_departureDate_1_flightClass_1_availability_1_fareType_1"

如果我忘记了什么,请写信给我。提前谢谢。

1 个答案:

答案 0 :(得分:4)

使用BtreeCursor这一事实并不能使查询正常。 explain的输出有助于确定问题。

我猜一个关键问题是查询参数的顺序:

// equality, good
"departureAirport" : "BSL", 
// equality, good
"arrivalAirport" : "SMF",
// range, bad because index based range queries should be near the end
// of contiguous index-based equality checks
"departureDate" : { 
    "$gte" : 1402617600000,
    "$lte" : 1403136000000
},
// what is this, and how many possible values does it have? Seems to be
// a low selectivity index -> remove from index and move to end
"flightClass" : "C",
// costly $or, one op. is a range query, the other one equality...
// Simply set 'availability' to a magic number instead. That's
// ugly, but optimizations are ugly and it's unlikely we see planes with
// over e.g. 900,000 seats in the next couple of decades...
"$or" : [ 
    { "availability" : { "$gte" : "3"}},
    { "availability" : "*"}
] , 
// again, looks like low selectivity to me. Since it's already at the end, 
// that's ok. I'd try to remove it from the index, however.
"fareType" : "OW"

您可能希望将索引更改为

"departureAirport_1_arrivalAirport_1_departureDate_1_availability_1"

并以完全相同的顺序查询。追加其他所有内容,因此必须仅对符合索引中所有其他条件的文档进行扫描。

我假设flightClassfareType的选择性较低。如果不是这样,这将不是最佳解决方案。