我有一个看起来像这样的json文件:
{
item-1: {propertyA: "blabla", propertyB: "blabla", propertyC: "blabla"},
item-2: {propertyA: "blabla", propertyB: "blabla", propertyC: "blabla"},
... ... ... ... ... ... ... ,
item-30000: {propertyA: "blabla", propertyB: "blabla", propertyC: "blabla"}
}
如何将此加载到Spark?我正在使用Scala和IntelijIDEA。
这需要花费太多时间,最终会出现如下错误:
WARN Executor: Issue communicating with driver in heartbeater
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds].
This timeout is controlled by spark.executor.heartbeatInterval
*错误是在我尝试通过在struct-Type中创建30.000 StructFields来显式推断模式之后。 通过隐式推断架构我有类似但更大的问题...... 任何帮助表示赞赏。
*实际上我对所有这些记录都有一条大线。为了更好的理解,我这样写了。
30分钟后结束:
/* 3002032 apply20000_14999(i);
/* 3002033 result.setTotalSize(holder.totalSize());
/* 3002034 return result;
/* 3002035 }
/* 3002036 }
org.codehaus.janino.JaninoRuntimeException: Constant pool for class org.apache.spark.sql.catalyst.expressions.
GeneratedClass$SpecificUnsafeProjection has grown past JVM limit of 0xFFFF