我正在尝试拆分这个字符串,这样我就可以计算出他以后使用map reduce包含的相同长度的单词数量。
例如,对于句子
支持真相是一个女人 - 那么呢? 我会 -
[
{length:”1”, number:”1”},
{length:”2”, number:”1”},
{length:”4”, number:”3”},
{length:”5”, number:”2”},
{length:”9”, number:”1”}
]
我该怎么做?
答案 0 :(得分:0)
此样本聚合将计算相同长度的单词。希望它会对你有所帮助:
db.some.remove({})
db.some.save({str:"red brown fox jumped over the hil"})
var res = db.some.aggregate(
[
{ $project : { word : { $split: ["$str", " "] }} },
{ $unwind : "$word" },
{ $project : { len : { $strLenCP: "$word" }} },
{ $group : { _id : { len : "$len"}, same: {$push:"$len"}}},
{ $project : { len : "$len", count : {$size : "$same"} }}
]
)
printjson(res.toArray());
答案 1 :(得分:0)
您的问题的答案在很大程度上取决于您对单词的定义。如果它只是A-Z或a-z字符的连续序列,那么这里是一个完全疯狂的方法,但是,它会为您提供您要求的确切结果。
此代码的作用是有效的
给出以下输入文件
{
"text" : "SUPPOSING that Truth is a woman--what then?"
}
以下管道
db.collection.aggregate({
$project: { // lots of magic to calulate an array that will hold the lengths of all words
"lengths": {
$map: { // translate a given word into its length
input: {
$split: [ // split cleansed string by space character
{ $reduce: { // join the characters that are between A and z
input: {
$map: { // to traverse the original input string character by character
input: {
$range: [ 0, { $strLenCP: "$text" } ] // we wamt to traverse the entire string from index 0 all the way until the last character
},
as: "index",
in: {
$let: {
vars: {
"char": { // temp. result which will be reused several times below
$substrCP: [ "$text", "$$index", 1 ] // the single character we look at in this loop
}
},
in: {
$cond: [ // some value that depends on whether the character we look at is between 'A' and 'z'
{ $and: [
{ $eq: [ { $cmp: [ "$$char", "@" /* ASCII 64, 65 would be 'A' */] }, 1 ] }, // is our character greater than or equal to 'A'
{ $eq: [ { $cmp: [ "$$char", "{" /* ASCII 123, 122 would be 'z' */] }, -1 ] } // is our character less than or equal to 'z'
]},
'$$char', // in which case that character will be taken
' ' // and otherwise a space character to add a word boundary
]
}
}
}
}
},
initialValue: "", // starting with an empty string
in: {
$concat: [ // we join all array values by means of concatenating
"$$value", // the current value with
"$$this"
]
}
}
},
" "
]
},
as: "word",
in: {
$strLenCP: "$$word" // we map a word into its length, e.g. "the" --> 3
}
}
}
}
}, {
$unwind: "$lengths" // flatten the array which holds all our word lengths
}, {
$group: {
_id : "$lengths", // group by the length of our words
"number": { $sum: 1 } // count number of documents per group
}
}, {
$match: {
"_id": { $ne: 0 } // $split might leave us with strings of length 0 which we do not want in the result
}
}, {
$project: {
"_id": 0, // remove the "_id" field
"length" : "$_id", // length is our group key
"number" : "$number" // and this is the number of findings
}
}, {
$sort: { "length": 1 } // sort by length ascending
})
将产生所需的输出
[
{ "length" : 1, "number" : 1.0 },
{ "length" : 2, "number" : 1.0 },
{ "length" : 4, "number" : 3.0 },
{ "length" : 5, "number" : 2.0 },
{ "length" : 9, "number" : 1.0 }
]