我正在研究大型mongoDB集合的CV数据的数据分析。我尝试计算职位名称(以下架构中的jobs.jobTitle字段)中单词的绝对频率。
文档的结构如下:
{
firstName: String,
lastName: String,
jobs: [{jobTitle: 'software architect', company: String, ...}, {jobTitle: 'full stack software engineer', company: String, ...}, {jobTitle: 'javascript developer', company: String, ...}],
...
}
我想遍历整个集合并获得如下结果:
[{word: 'manager', count: 3245},{word: 'engineer', count: 3102}, {word: 'software', count: 3021}, ..]
我尝试了以下汇总:
db.cvs.aggregate([
{
$project: {
words: { $split: ["$jobs.jobTitle", " "] }
}
},
{
$unwind: {
path: "$words"
}
},
{
$group: {
_id: "$words",
count: { $sum: 1 }
}
},
{ $sort: { "count": -1 } }
])
会导致以下错误消息:
$split requires an expression that evaluates to a string as a first argument, found: array
是否可以通过使用聚合将Jobs.jobTitle的字符串值首先连接到字符串?还是有其他方法可以达到预期的结果?
答案 0 :(得分:0)
感谢@NeilLunn的快速评论
我想与所有人共享更正的查询:
db.cvs.aggregate([
{ "$unwind": "$jobs" },
{
$project: {
words: { $split: ["$jobs.jobTitle", " "] }
}
},
{
$unwind: {
path: "$words"
}
},
{
$group: {
_id: "$words",
count: { $sum: 1 }
}
},
{ $sort: { "count": -1 } }
])