Question

在谷歌和SO挖了一个星期之后，我最终在这里问了这个问题。假设有两个集合，

UsersCollection：

[
{...
    name:"James"
    userregex: "a|regex|str|here"
},
{...
    name:"James"
    userregex: "another|regex|string|there"
},
...
]

PostCollection：

[
{...
    title:"a string here ..."
},
{...
    title: "another string here ..."
},
...
]

我需要让userregex匹配任何post.title的所有用户（需要user_id，post_id组或类似内容）。

到目前为止我尝试了什么：
1.让所有用户收藏，在所有产品上运行正则表达式，但工作太脏了！它必须为每个用户执行查询 2.与上面相同，但在Mongo查询中使用foreach，它与上面相同，但只有数据库层而不是应用程序层

我搜索了很多可用的方法，比如聚合，逆风等没有运气那么有可能在Mongo中这样做吗？我应该更改我的数据库类型吗？如果是的话什么类型会好？表现是我的第一要务。感谢

Answer 1

MongoDB适用于您的用例，但您需要使用与当前方法不同的方法。由于您只关心与任何帖子匹配的任何标题，因此您可以存储此类匹配的最后结果。下面是一个示例代码

db.users.find({last_post_id: {$exists: 0}}).forEach(
   function(row) {
       var regex = new RegExp(row['userregex']);
       var found = db.post_collection.findOne({title: regex});
       if (found) {
           post_id = found["post_id"];
           db.users.updateOne({
                 user_id: row["user_id"]
               }, {
                    $set :{ last_post_id:  post_id}
                   }); 
       }
   }
)

它的作用是仅筛选没有设置last_post_id的用户，搜索记录，并在找到记录时设置last_post_id。所以在运行之后，您可以返回结果，如

db.users.find({last_post_id: {$exists: 1}}, {user_id:1, last_post_id:1, _id:0})

您唯一需要关注的是对现有帖子进行编辑/删除。因此，在每次编辑/删除之后，您应该只在下面运行，以便再次运行该帖子ID的所有匹配。

post_id_changed = 1
db.users.updateMany({last_post_id: post_id_changed}, {$unset: {last_post_id: 1}})

这将确保您下次运行更新时会再次处理这些用户。该方法确实有一个缺点，即对于没有匹配标题的每个用户，对这些用户的查询将一次又一次地运行。虽然您可以通过使用一些时间戳或发布计数检查来解决这个问题

此外，您应该确保将索引放在post_collection.title

上

Answer 2

我在想，如果你像这样预先标记你的帖子标题：

{
  "_id": ...
  "title": "Another string there",
  "keywords": [
    "another",
    "string",
    "there"
  ]
}

但不幸的是$lookup要求foreignField是一个单独的元素，所以我对这样的想法不工作:(但也许它会给你另一个想法？

db.Post.aggregate([
   {$lookup: {
          from: "Users",
          localField: "keywords",
          foreignField: "keywords",
          as: "users"
        }
    },
]))

Answer 3

无法在匹配表达式中的regex运算符中引用存储在文档中的正则表达式字段。

所以它不能在具有当前结构的mongo方面完成。

$lookup适用于平等条件。因此，一个替代方案（类似于Nic建议的那样）将更新您的帖子集合，以包含一个名为keywords的额外字段（可以搜索的关键字值数组）。

db.users.aggregate([
   {$lookup: {
          from: "posts",
          localField: "userregex",
          foreignField: "keywords",
          as: "posts"
        }
    }
])

上面的查询会做这样的事情（从3.4开始）。

keywords: { $in: [ userregex.elem1, userregex.elem2, ... ] }.

来自文档

如果该字段包含数组，则$ in运算符选择其字段包含至少包含一个数组的文档与指定数组中的值匹配的元素（例如，等等。）

看起来早期版本（在3.2上测试）只会匹配数组是否具有相同的顺序，数组的值和长度是相同的。

示例输入：

用户

db.users.insertMany([
  {
    "name": "James",
    "userregex": [
      "another",
      "here"
    ]
  },
  {
    "name": "John",
    "userregex": [
      "another",
      "string"
    ]
  }
])

帖子

db.posts.insertMany([
  {
    "title": "a string here",
    "keyword": [
      "here"
    ]
  },
  {
    "title": "another string here",
    "keywords": [
      "another",
      "here"
    ]
  },
  {
    "title": "one string here",
    "keywords": [
      "string"
    ]
  }
])

示例输出：

[
  {
    "name": "James",
    "userregex": [
      "another",
      "here"
    ],
    "posts": [
      {
        "title": "another string here",
        "keywords": [
          "another",
          "here"
        ]
      },
      {
        "title": "a string here",
        "keywords": [
          "here"
        ]
      }
    ]
  },
  {
    "name": "John",
    "userregex": [
      "another",
      "string"
    ],
    "posts": [
      {
        "title": "another string here",
        "keywords": [
          "another",
          "here"
        ]
      },
      {
        "title": "one string here",
        "keywords": [
          "string"
        ]
      }
    ]
  }
]

mongodb检查从一个集合到其他集合中所有字段的字段的正则表达式

3 个答案: