如何将Python函数传递给pymongo map_reduce函数(bson.code)?

时间:2019-05-13 11:24:32

标签: python function scope pymongo-3.x bson.code

我已经搜索了各种论坛,但找不到适合我的担忧的答案。

我正在使用pymongo bson.code来映射减少MongoDB中存在的名为 MediaSummary 的文档集合。得到的集合将是一个唯一(不停)单词的列表,每个单词及其关联的文档ID。

例如,收集数据如下:

{_id: "abc", VideoData: "This video contains documentary info about Universe."}
{_id: "def", VideoData: "This video is about Milkyway."}
{_id: "ghi", VideoData: "This video is about Earth and its importance."}

Python代码:

import pymongo
from nltk.corpus import stopwords 
from nltk.tokenize import word_tokenize

from pymongo import MongoClient
from bson.code import Code

client = MongoClient('mongodb://myuser:abc123@my-server/DataSummary')

db = client['DataSummary']

def getFilteredSentence(sentence):

    stop_words = set(stopwords.words('english')) 
    word_tokens = word_tokenize(sentence) 
    words = [word for word in word_tokens if word.isalpha()]  # remove punctuation
    filteredSentence = [] 

    for w in words:
        if (w and w.strip()):
            if w.lower() not in stop_words: 
                filteredSentence.append(w.strip().lower()) 

    return filteredSentence

映射器功能:

mapper = Code(""" 
           function map() {
            if (!this.VideoData) return;

            var key = this._id.toString();
            var value = getFS(this.VideoData); // getFS is a Python function passed to it via scope. It filters out the stop words from sentence.

            emit(key, value);
          };
          """

减速器功能:

reducer = Code("""
               function reduce(key, values) {
                   uniqueWordsWithDocIds = [];
                   // Some further processing on values

                   return uniqueWordsWithDocIds 
               };
               """

MediaSummary 集合上调用map_reduce:

output_collection = db.MediaSummary.map_reduce(
    mapper, reducer, 'SearchWords',
    scope = {
        'getFS': somePyThonFuncForFilteringOutStopWords  # This must be a Python function that I can make use of in my mapper function
    })

需要帮助。请指导。

0 个答案:

没有答案