目前我在做:
class UrlBuilder {
// ...
_iterateAndBuild(current_val, index, array) {
// ...
}
buildServiceArr() {
DATA.forEach(this._iterateAndBuild, this);
}
}
但这很慢。有人可以建议我有更好的方法来实现这一目标吗?基本上我是从不在val DF = sqlSession.sql("select itemIdDig as itemId, "
+ "title"
+ "timestamp as time "
+ "from itemTable ")
val tempDF = sqlSession.sql("select itemIdDig as itemId "
+ "from itemTable "
+ "group by itemIdDig HAVING count(*) >= 10 ").rdd.map(r => r(0)).collect()
//keep itemIds which are not in DF
DF.filter(!col("itemId").isin(tempDF : _*)).toDF
的行中查找的(我尝试使用group,它给了我唯一的tempDF
,但我想保留重复项)
答案 0 :(得分:2)
半连接:
DF.join(tempDF, Seq("itemId"), "leftanti")