如何从一系列数据帧中删除空数据帧?在下面的代码片段中,twoColDF中有许多空数据帧。以下for循环的另一个问题是,有一种方法可以使这个效率更高吗?我尝试将其重写为以下行,但无法正常工作
//finalDF2 = (1 until colCount).flatMap(j => groupCount(j).map( y=> finalDF.map(a=>a.filter(df(cols(j)) === y)))).toSeq.flatten
var twoColDF: Seq[Seq[DataFrame]] = null
if (colCount == 2 )
{
val i = 0
for (j <- i + 1 until colCount) {
twoColDF = groupCount(j).map(y => {
finalDF.map(x => x.filter(df(cols(j)) === y))
})
}
}finalDF = twoColDF.flatten
答案 0 :(得分:1)
给定一组DataFrame,您可以访问每个DataFrame的基础RDD,并使用isEmpty
过滤掉空的RDD:
val input: Seq[DataFrame] = ???
val result = input.filter(!_.rdd.isEmpty())
至于你的其他问题 - 我无法理解你的代码尝试做什么,但我首先尝试将其转换为更多功能(删除var
的使用和必要的条件)。如果我猜测您输入的含义,这里的内容可能与您尝试的内容相同:
var input: Seq[DataFrame] = ???
// map of column index to column values -
// for each combination we'd want a new DF where that column has that value
// I'm assuming values are Strings, can be anything else
val groupCount: Map[Int, Seq[String]] = ???
// for each combination of DF + column + value - produce the filtered DF where this column has this value
val perValue: Seq[DataFrame] = for {
df <- input
index <- groupCount.keySet
value <- groupCount(index)
} yield df.filter(col(df.columns(index)) === value)
// remove empty results:
val result: Seq[DataFrame] = perValue.filter(!_.rdd.isEmpty())