Spark:使用Stratio和RDD查询Mongodb

时间:2016-02-19 13:47:40

标签: mongodb apache-spark stratio

我使用Stratio(0.11。)使用Spark查询MongoDB。我有兴趣使用RDD(没有DataFrame)。

我现在正在做的是:

val mongoRDD = new MongodbRDD(sqlContext, readConfig, new MongodbPartitioner(readConfig))
mongoRDD.foreach(println)

并以正确的方式显示收藏内容。

有没有办法使用Stratio(在我的情况下,查询是$ near类型)的查询(String或通过QueryBuilder构建)来应用MongodbRDD?< / p>

1 个答案:

答案 0 :(得分:3)

正如@ zero323暗示的那样,这样做的方法是使用filters参数。这些过滤器由库检查并与MongoDB QueryBuilder可用过滤器匹配。

来自Spark-MongoDB source code

sFilters.foreach {
    case EqualTo(attribute, value) =>
      queryBuilder.put(attribute).is(checkObjectID(attribute, value))
    case GreaterThan(attribute, value) =>
      queryBuilder.put(attribute).greaterThan(checkObjectID(attribute, value))
    case GreaterThanOrEqual(attribute, value) =>
      queryBuilder.put(attribute).greaterThanEquals(checkObjectID(attribute, value))
    case In(attribute, values) =>
      queryBuilder.put(attribute).in(values.map(value => checkObjectID(attribute, value)))
    case LessThan(attribute, value) =>
      queryBuilder.put(attribute).lessThan(checkObjectID(attribute, value))
    case LessThanOrEqual(attribute, value) =>
      queryBuilder.put(attribute).lessThanEquals(checkObjectID(attribute, value))
    case IsNull(attribute) =>
      queryBuilder.put(attribute).is(null)
    case IsNotNull(attribute) =>
      queryBuilder.put(attribute).notEquals(null)
    case And(leftFilter, rightFilter) if !parentFilterIsNot =>
      queryBuilder.and(filtersToDBObject(Array(leftFilter)), filtersToDBObject(Array(rightFilter)))
    case Or(leftFilter, rightFilter)  if !parentFilterIsNot =>
      queryBuilder.or(filtersToDBObject(Array(leftFilter)), filtersToDBObject(Array(rightFilter)))
    case StringStartsWith(attribute, value) if !parentFilterIsNot =>
      queryBuilder.put(attribute).regex(Pattern.compile("^" + value + ".*$"))
    case StringEndsWith(attribute, value) if !parentFilterIsNot =>
      queryBuilder.put(attribute).regex(Pattern.compile("^.*" + value + "$"))
    case StringContains(attribute, value) if !parentFilterIsNot =>
      queryBuilder.put(attribute).regex(Pattern.compile(".*" + value + ".*"))
    case Not(filter) =>
      filtersToDBObject(Array(filter), true)
  }

正如您所看到的,near未被应用,但似乎可以轻松地将其添加到连接器功能,因为QueryBuilder offers methods to use that MongoDB function

您可以尝试修改连接器。但是,我会尝试在接下来的日子里实施它并制作PR。

修改

PR has been opened包含描述$near的源过滤器类型,因此您可以将MongodbRdd用作:

val mongoRDD = new MongodbRDD(
    sqlContext,
    readConfig,
    new MongodbPartitioner(readConfig),
    filters = FilterSection(Array(Near("x", 3.0, 4.0))))
)