使用clojure和flambo过滤RDD

时间:2015-08-03 20:43:35

标签: clojure apache-spark

我的RDD索引形式为:(:rdd xctx)

[[["1" "32" "44" "55" "14"] 0] [["21" "23" "24" "25" "24"] 1] [["41" "53" "54" "5" "24"] 2] [["11" "35" "34" "15" "64"] 3]]

我希望过滤出在矢量中包含索引的RDD,例如:

:row-list s[1 3] 

我尝试了这个,但不知怎的,我收到了一个错误:

(defn remove-index-rows
 "Function to catch the row(s) with the specific Row Number(s) in rows-list
  input = { :rows-list [ val(s)]}"
  [row input]
  (let [{:keys [ rows-list ]} input
    row-and-index (f/collect (f/filter #(= row (get % 0)) (:rdd xctx)))]
    (when-not (some #(= (get row-and-index 1) %) rows-list) row)))

所需的输出是:

 [ [["1" "32" "44" "55" "14"] 0] [["41" "53" "54" "5" "24"] 2] ]

感谢您提供帮助

1 个答案:

答案 0 :(得分:1)

对于starers,我会用row替换rows-list。让我们按如下方式定义

(set row-list)

之后你可以像这样过滤:

(f/filter
 (:rdd xctx)
 (f/fn [row] (let [[v i] row] (not (contains? row-set i)))))