删除指定的密钥

Question

今天我遇到了一个需要将mongoDB集合同步到vertica（SQL数据库）的情况，其中我的对象键将是SQL中表的列。我使用mongoDB聚合框架，首先查询，操作和投影想要的结果文档，然后我将它同步到vertica。

我想要聚合的架构如下所示：

{
  userId: 123
  firstProperty: {
    firstArray: ['x','y','z'],
    anotherAttr: 'abc'
  },
  anotherProperty: {
    secondArray: ['a','b','c'],
    anotherAttr: 'def'
  }  
}

由于数组值与其他数组值无关，我需要的是嵌套数组的每个值都将在一个单独的结果文档中。为此，我使用以下聚合管道：

db.collection('myCollection').aggregate([
            {
                $match: {
                    $or: [
                        {'firstProperty.firstArray.1': {$exists: true}},
                        {'secondProperty.secondArray.1': {$exists: true}}
                    ]
                }
            },
            {
                $project: {
                    userId: 1,
                    firstProperty: 1,
                    secondProperty: 1
                }
            }, {
                $unwind: {path:'$firstProperty.firstAray'}
            }, {
                $unwind: {path:'$secondProperty.secondArray'},
            }, {
                $project: {
                    userId: 1,
                    firstProperty: '$firstProperty.firstArray',
                    firstPropertyAttr: '$firstProperty.anotherAttr',
                    secondProperty: '$secondProperty.secondArray',
                    seondPropertyAttr: '$secondProperty.anotherAttr'
                }
            }, {
                $out: 'another_collection'
            }
        ])

我期望得到以下结果：

{
  userId: 'x1',
  firstProperty: 'x',
  firstPropertyAttr: 'a'
}
{
  userId: 'x1',
  firstProperty: 'y',
  firstPropertyAttr: 'a'
}
{
  userId: 'x1',
  firstProperty: 'z',
  firstPropertyAttr: 'a'
}
{
  userId: 'x1',
  secondProperty: 'a',
  firstPropertyAttr: 'b'
}
{
  userId: 'x1',
  secondProperty: 'b',
  firstPropertyAttr: 'b'
}
{
  userId: 'x1',
  secondProperty: 'c',
  firstPropertyAttr: 'b'
}

相反，我得到了类似的东西：

{
  userId: 'x1',
  firstProperty: 'x',
  firstPropertyAttr: 'b'
  secondProperty: 'a',
  secondPropertyAttr: 'b'
}
{
  userId: 'x1',
  firstProperty: 'y',
  firstPropertyAttr: 'b'
  secondProperty: 'b',
  secondPropertyAttr: 'b'
}
{
  userId: 'x1',
  firstProperty: 'z',
  firstPropertyAttr: 'b'
  secondProperty: 'c',
  secondPropertyAttr: 'b'
}

我到底错过了什么，我该如何解决？

Answer 1

这实际上是一个很多＆＃34; curlier＆＃34;问题比你想象的还要严重，这一切都归结为＆＃34;命名键＆＃34;，这通常是一个真正的问题而你的数据应该是＃34;没有使用＆＃34;数据点＆＃34;在命名这些键时。

您尝试中的另一个明显问题是“笛卡尔积”＆＃34;。这是您$unwind一个数组然后$unwind另一个数组的位置，这会产生来自＆＃34; first＆＃34;对{＆＃34;第二个＆＃34;

中出现的每个值重复$unwind

解决第二个问题，基本方法是结合数组＆＃34;为了你只从一个来源$unwind。这对所有剩余的方法都很常见。

至于方法，这些方法在您提供的MongoDB版本和应用程序的一般实用性方面有所不同。让我们逐步介绍它们：

删除指定的密钥

这里最简单的方法是不要指望输出中的命名键，而是将它们标记为"name"，在最终输出中标识它们的来源。所以我们要做的就是指定每个＆＃34;期望＆＃34;构建初始＆＃34;组合＆＃34;数组，然后简单地$filter表示由本文档中不存在的命名路径产生的任何null值。

db.getCollection('myCollection').aggregate([
  { "$match": {
    "$or": [
      { "firstProperty.firstArray.0": { "$exists": true } },
      { "anotherProperty.secondArray.0": { "$exists": true } }
    ]  
  }},
  { "$project": {
    "_id": 0,
    "userId": 1,
    "combined": {
      "$filter": {
        "input": [
          { 
            "name": { "$literal": "first" },
            "array": "$firstProperty.firstArray",
            "attr": "$firstProperty.anotherAttr"
          },
          {
            "name": { "$literal": "another" },
            "array": "$anotherProperty.secondArray",
            "attr": "$anotherProperty.anotherAttr"
          }
        ],
        "cond": {
          "$ne": ["$$this.array", null ]
        }
      }
    }
  }},
  { "$unwind": "$combined" },
  { "$unwind": "$combined.array" },
  { "$project": {
    "userId": 1,
    "name": "$combined.name",
    "value": "$combined.array",
    "attr": "$combined.attr"
  }}
])

根据问题中包含的数据，这将产生：

/* 1 */
{
    "userId" : 123.0,
    "name" : "first",
    "value" : "x",
    "attr" : "abc"
}

/* 2 */
{
    "userId" : 123.0,
    "name" : "first",
    "value" : "y",
    "attr" : "abc"
}

/* 3 */
{
    "userId" : 123.0,
    "name" : "first",
    "value" : "z",
    "attr" : "abc"
}

/* 4 */
{
    "userId" : 123.0,
    "name" : "another",
    "value" : "a",
    "attr" : "def"
}

/* 5 */
{
    "userId" : 123.0,
    "name" : "another",
    "value" : "b",
    "attr" : "def"
}

/* 6 */
{
    "userId" : 123.0,
    "name" : "another",
    "value" : "c",
    "attr" : "def"
}

合并对象 - 需要至少MongoDB 3.4.4

实际使用＆＃34;命名键＆＃34;我们需要自MongoDB 3.4.4起仅提供的$objectToArray和$arrayToObject运算符。使用这些和$replaceRoot流水线阶段，我们可以简单地处理您想要的输出，而无需显式命名在任何阶段输出的键：

db.getCollection('myCollection').aggregate([
  { "$match": {
    "$or": [
      { "firstProperty.firstArray.0": { "$exists": true } },
      { "anotherProperty.secondArray.0": { "$exists": true } }
    ]  
  }},
  { "$project": {
    "_id": 0,
    "userId": 1,
    "data": {
      "$reduce": {
        "input": {
          "$map": {
            "input": {
              "$filter": {
                "input": { "$objectToArray": "$$ROOT" },
                "cond": { "$not": { "$in": [ "$$this.k", ["_id", "userId"] ] } }
              }
            },
            "as": "d",
            "in": {
              "$let": {
                "vars": {
                  "inner": {
                    "$map": {
                      "input": { "$objectToArray": "$$d.v" },
                      "as": "i",
                      "in": {
                        "k": {
                          "$cond": {
                            "if": { "$ne": [{ "$indexOfCP": ["$$i.k", "Array"] }, -1] },
                            "then": "$$d.k",
                            "else": { "$concat": ["$$d.k", "Attr"] }
                          }  
                        },
                        "v": "$$i.v"
                      }
                    }
                  }
                },
                "in": {
                  "$map": {
                    "input": { 
                      "$arrayElemAt": [
                        "$$inner.v",
                        { "$indexOfArray": ["$$inner.k", "$$d.k"] } 
                      ]
                    },
                    "as": "v",
                    "in": {
                      "$arrayToObject": [[
                        { "k": "$$d.k", "v": "$$v" },
                        { 
                          "k": { "$concat": ["$$d.k", "Attr"] },
                          "v": {
                            "$arrayElemAt": [
                              "$$inner.v",
                              { "$indexOfArray": ["$$inner.k", { "$concat": ["$$d.k", "Attr"] }] }
                            ]
                          }
                        }
                      ]]
                    }
                  }
                }
              }
            }
          }
        },
        "initialValue": [],
        "in": { "$concatArrays": [ "$$value", "$$this" ] }
      }
    }
  }},
  { "$unwind": "$data" },
  { "$replaceRoot": {
    "newRoot": {
      "$arrayToObject": {
        "$concatArrays": [
          [{ "k": "userId", "v": "$userId" }],
          { "$objectToArray": "$data" }
        ]
      } 
    }   
  }}
])

通过转换＆＃34;键＆＃34;这变得非常可怕。进入一个数组，然后是＆＃34;子键＆＃34;到数组并将这些内部数组的值映射到输出中的键对。

关键部分$objectToArray基本上需要＆＃34;转换＆＃34;你的＆＃34;嵌套键＆＃34;结构为"k"和"v"的数组，代表＆＃34; name＆＃34;关键和＆＃34;值＆＃34;。这被称为两次，对于＆＃34;外部＆＃34;文件的一部分，不包括＆＃34;常数＆＃34;诸如"_id"和"userId"之类的字段成为这样的数组结构。然后在每个＆＃34;阵列＆＃34;上处理第二个呼叫。元素，以使这些＆＃34;内部键＆＃34;一个类似的＆＃34;阵列＆＃34;。

然后使用$indexOfCP进行匹配，以确定哪个＆＃34;内部密钥＆＃34;是价值的那个，而且是＃34; Attr＆＃34;。然后将密钥重命名为＆＃34; outer＆＃34;密钥值，我们可以访问，因为"v"是$objectToArray的礼貌。

然后是＆＃34;内在价值＆＃34;这是一个＆＃34;数组＆＃34;，我们想要$map每个条目进入一个组合的＆＃34;数组＆＃34;基本上有以下形式：

[
  { "k": "firstProperty", "v": "x" },
  { "k": "firstPropertyAttr", "v": "abc" }
]

每个＆＃34;内部阵列都会发生这种情况。元素，$arrayToObject为其转换流程并将每个"k"和"v"转换为＆＃34;键＆＃34;和＆＃34;价值＆＃34;一个对象分别。

由于输出仍是数组＆＃34;数组＆＃34; ＆＃34;内键＆＃34;此时，$reduce包装输出并在处理每个元素时应用$concatArrays以便＆＃34;加入＆＃34;成为"data"的单个数组。

剩下的就是简单地$unwind从每个源文档生成的数组，然后应用$replaceRoot，这是实际允许＆＃34;不同的键名称的部分＆＃34;在＆＃34; root＆＃34;每个文件输出。

＆＃34;合并＆＃34;这里是通过提供一个由"k"标注的相同"v"和"userId"结构的对象数组来完成的，并且＆＃34;连接＆＃34;使用"data"的{{3}}转换。当然这个＆＃34;新阵列＆＃34;然后最后一次通过$objectToArray转换为一个对象，形成＆＃34;对象＆＃34;将"newRoot"作为表达式的参数。

当存在大量＆＃34;命名键时，你会做类似的事情。你不能明确地说明。它实际上为您提供了您想要的结果：

/* 1 */
{
    "userId" : 123.0,
    "firstProperty" : "x",
    "firstPropertyAttr" : "abc"
}

/* 2 */
{
    "userId" : 123.0,
    "firstProperty" : "y",
    "firstPropertyAttr" : "abc"
}

/* 3 */
{
    "userId" : 123.0,
    "firstProperty" : "z",
    "firstPropertyAttr" : "abc"
}

/* 4 */
{
    "userId" : 123.0,
    "anotherProperty" : "a",
    "anotherPropertyAttr" : "def"
}

/* 5 */
{
    "userId" : 123.0,
    "anotherProperty" : "b",
    "anotherPropertyAttr" : "def"
}

/* 6 */
{
    "userId" : 123.0,
    "anotherProperty" : "c",
    "anotherPropertyAttr" : "def"
}

没有MongoDB 3.4.4或更高版本的命名密钥

如果没有上面列表中显示的运营商支持，聚合框架根本无法输出具有不同密钥名称的文档。

因此虽然不可能指示＆＃34;服务器＆＃34;要通过$out执行此操作，您当然可以简单地迭代游标并编写新集合

var ops = [];

db.getCollection('myCollection').find().forEach( d => {
  ops = ops.concat(Object.keys(d).filter(k => ['_id','userId'].indexOf(k) === -1 )
    .map(k => 
      d[k][Object.keys(d[k]).find(ki => /Array$/.test(ki))]
        .map(v => ({
          [k]: v,
          [`${k}Attr`]: d[k][Object.keys(d[k]).find(ki => /Attr$/.test(ki))]
      }))
    )
    .reduce((acc,curr) => acc.concat(curr),[])
    .map( o => Object.assign({ userId: d.userId },o) )
  );

  if (ops.length >= 1000) {
    db.getCollection("another_collection").insertMany(ops);
    ops = [];
  }

})

if ( ops.length > 0 ) {
  db.getCollection("another_collection").insertMany(ops);
  ops = [];
}

与早期聚合中所做的相同，但只是＆＃34;外部＆＃34;。它基本上为每个匹配＆＃34;内部＆＃34;的文档生成文档和数组。数组，如：

[ 
    {
        "userId" : 123.0,
        "firstProperty" : "x",
        "firstPropertyAttr" : "abc"
    }, 
    {
        "userId" : 123.0,
        "firstProperty" : "y",
        "firstPropertyAttr" : "abc"
    }, 
    {
        "userId" : 123.0,
        "firstProperty" : "z",
        "firstPropertyAttr" : "abc"
    }, 
    {
        "userId" : 123.0,
        "anotherProperty" : "a",
        "anotherPropertyAttr" : "def"
    }, 
    {
        "userId" : 123.0,
        "anotherProperty" : "b",
        "anotherPropertyAttr" : "def"
    }, 
    {
        "userId" : 123.0,
        "anotherProperty" : "c",
        "anotherPropertyAttr" : "def"
    }
]

这些得到＆＃34;缓存＆＃34;到达一个大数组，当达到1000或更长时，最终通过$arrayToObject写入新集合。当然这需要＆＃34;来回＆＃34;与服务器进行通信，但如果您没有可用于上一次聚合的功能，它确实可以以最有效的方式完成工作。

结论

总的来说，除非你真的有一个支持它的MongoDB，否则你不会得到带有＆＃34;不同密钥名称的文件＆＃34;在输出中，仅来自聚合管道。

因此，当您没有这种支持时，您可以使用第一个选项，然后使用.insertMany()丢弃具有命名键。或者你做最后的方法，只需操纵光标结果并写回新的集合。

将多个文档数组展开到新文档中

1 个答案:

删除指定的密钥

合并对象 - 需要至少MongoDB 3.4.4

没有MongoDB 3.4.4或更高版本的命名密钥

结论