Spark DataFrame groupBy和聚合正在抛出NegativeArraySizeException

时间:2016-06-09 16:26:04

标签: exception apache-spark dataframe

我在Spark DataFrame上执行以下查询

  input
   .select("id")
   .groupBy("id")
    .agg(count("*").as("count"))

我收到了java.lang.NegativeArraySizeException

at org.apache.spark.unsafe.types.UTF8String.getBytes(UTF8String.java:234)
at org.apache.spark.unsafe.types.UTF8String.toString(UTF8String.java:827)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection.apply(Unknown Source)
at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator$$anonfun$generateProcessRow$1.apply(TungstenAggregationIterator.scala:276)
at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator$$anonfun$generateProcessRow$1.apply(TungstenAggregationIterator.scala:273)
at org.apache.spark.sql.execution.aggregate.TungstenAggregationIterator.processInputs(TungstenAggregationIterator.scala:533)

1 个答案:

答案 0 :(得分:0)

下面应该有效

input.groupBy("id").count()