SparkSQL并在Java中的DataFrame上爆炸

时间:2015-08-06 15:03:20

标签: java apache-spark apache-spark-sql

在SparkSQL -(void)dismissStartupAlertView:(BOOL)buttonClicked { CGRect rect = m_customStartupView.view.frame; if(IPHONE_5) { rect.origin.y = 520 } else { rect.origin.y = 420 } m_customStartupView.view.frame = rect; [UIView beginAnimations:@"ShowView" context:nil]; [UIView setAnimationCurve:UIViewAnimationCurveEaseInOut]; [UIView setAnimationDuration:0.2]; [UIView setAnimationDelegate:self]; [UIView setAnimationDidStopSelector:@selector(animationDidStop:finished:context:)]; if(IPHONE_5) { rect.origin.y = 568; } else { rect.origin.y = 480; } m_customStartupView.view.frame = rect; [UIView commitAnimations]; } - (void)animationDidStop:(NSString *)animationID finished:(NSNumber *)finished context:(void *)context { if([animationID isEqualToString:@"ShowView"]) { [m_customStartupView willMoveToParentViewController:nil]; [m_customStartupView.view removeFromSuperview]; [m_customStartupView removeFromParentViewController]; } } 上的数组列上使用explode有一个简单的方法吗?它在Scala中相对简单,但在Java中这个函数似乎不可用(如javadoc中所述)。

一个选项是在查询中使用DataFrameSQLContext.sql(...)函数,但我正在寻找更好,更清洁的方法。 explode是从镶木地板文件中加载的。

2 个答案:

答案 0 :(得分:15)

我用这种方式解决了这个问题:假设你有一个包含名为“positions”的职位描述的数组列,对于每个拥有“fullName”的人。

然后你从初始架构得到:

root
|-- fullName: string (nullable = true)
|-- positions: array (nullable = true)
    |    |-- element: struct (containsNull = true)
    |    |    |-- companyName: string (nullable = true)
    |    |    |-- title: string (nullable = true)
...

到架构:

root
 |-- personName: string (nullable = true)
 |-- companyName: string (nullable = true)
 |-- positionTitle: string (nullable = true)

做:

    DataFrame personPositions = persons.select(persons.col("fullName").as("personName"),
          org.apache.spark.sql.functions.explode(persons.col("positions")).as("pos"));

    DataFrame test = personPositions.select(personPositions.col("personName"),
    personPositions.col("pos").getField("companyName").as("companyName"), personPositions.col("pos").getField("title").as("positionTitle"));

答案 1 :(得分:6)

似乎可以使用org.apache.spark.sql.functions.explode(Column col)DataFrame.withColumn(String colName, Column col)的组合将列替换为其分解版本。