apache spark数据帧

时间:2016-07-11 16:29:56

标签: apache-spark spark-dataframe apache-spark-mllib

SPARK-版本:1.5.2,纱线2.7.1.2.3.0.0-2557

当我通过spark-shell探索数据时,我遇到了一个问题,我试图用3000列创建一个非常胖的数据帧。代码如下:

val valueFunctionUDF = udf((valMap: Map[String, String], dataItemId: String) =>
  valMap.get(dataItemId) match {
  case Some(v) => v.toDouble
  case None => Double.NaN
})

s1是主要的数据框架和架构如下:

|-- combKey: string (nullable = true)
|-- valMaps: map (nullable = true)
|    |-- key: string
|    |-- value: string (valueContainsNull = true)
运行代码后

dataItemIdVals.foreach{w =>
 s1 = s1.withColumn(w, valueFunctionUDF($"valMaps", $"combKey"))}

我的终端刚刚在上面的列中卡住,信息被打印出来:

16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on 172.22.49.20:41494 in memory (size: 7.6 KB, free: 5.2 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:43026 in memory (size: 7.6 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:44890 in memory (size: 7.6 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:52020 in memory (size: 7.6 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:33272 in memory (size: 7.6 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:48481 in memory (size: 7.6 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:44026 in memory (size: 7.6 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:34539 in memory (size: 7.6 KB, free: 5.0 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:43734 in memory (size: 7.6 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:42769 in memory (size: 7.6 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:60603 in memory (size: 7.6 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:59102 in memory (size: 7.6 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:47578 in memory (size: 7.6 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:43149 in memory (size: 7.6 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:52488 in memory (size: 7.6 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_3_piece0 on xxxxx:52298 in memory (size: 7.6 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO ContextCleaner: Cleaned accumulator 9
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on 172.22.49.20:41494 in memory (size: 7.3 KB, free: 5.2 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:33272 in memory (size: 7.3 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:59102 in memory (size: 7.3 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:44026 in memory (size: 7.3 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:42769 in memory (size: 7.3 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:43149 in memory (size: 7.3 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:43026 in memory (size: 7.3 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:52298 in memory (size: 7.3 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:42890 in memory (size: 7.3 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:47578 in memory (size: 7.3 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:60603 in memory (size: 7.3 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:43734 in memory (size: 7.3 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:48481 in memory (size: 7.3 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:52020 in memory (size: 7.3 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:52488 in memory (size: 7.3 KB, free: 5.1 GB)
 16/07/11 12:20:54 INFO BlockManagerInfo: Removed broadcast_2_piece0 on xxxxx:34539 in memory (size: 7.3 KB, free: 5.0 GB)
 16/07/11 12:20:54 INFO ContextCleaner: Cleaned accumulator 8
 16/07/11 12:20:54 INFO ContextCleaner: Cleaned shuffle 0
 16/07/11 12:20:54 INFO ContextCleaner: Cleaned accumulator 7
 16/07/11 12:20:54 INFO ContextCleaner: Cleaned accumulator 6
 16/07/11 12:20:54 INFO ContextCleaner: Cleaned accumulator 5
 16/07/11 12:20:54 INFO ContextCleaner: Cleaned accumulator 4

sparkUI上没有任何内容,我猜火花正在计算新数据帧的一些元数据(列数等)?以前有人见过这种问题吗?无论如何要绕过它?

0 个答案:

没有答案