使用Apache Spark时出现java.lang.StackOverflowError

时间:2018-09-24 03:44:06

标签: java apache-spark stack-overflow

我尝试分别阅读一个语料库中的约400个文件,并将每个文件拆分成单词。然后将其映射到(key,value) ((fileName, word) , 1),但遇到java.lang.StackOverflowError。我在Apache Spark中使用Java。

 for (int i = 1; i <= 400; i++) {

        String fileNames = fileName  + "/New" + i + ".txt";
        lines = sc.textFile(fileNames);
         words = lines.flatMap(s -> Arrays.asList(s.split(" ")).iterator());


        JavaPairRDD<Tuple2<String, String>, Integer> file = words.mapToPair(new PairFunction<String, Tuple2<String, String>, Integer>() {
            @Override
            public Tuple2<Tuple2<String, String>, Integer> call(String s) throws Exception {
                return new Tuple2<Tuple2<String, String>, Integer>(new Tuple2<String, String>((fileNames), s), 1);

            }
        });


        if (finalCorpus != null) {
            finalCorpus = finalCorpus.union(file);

        } else {
            finalCorpus = file;
        }

    }

什么是最佳解决方案?我有5G可用内存,对于这些数量的文件,此错误不合理。堆栈跟踪如下:

Exception in thread "main" java.lang.StackOverflowError
at java.io.ObjectStreamClass$FieldReflector.getPrimFieldValues(ObjectStreamClass.java:2002)
at java.io.ObjectStreamClass.getPrimFieldValues(ObjectStreamClass.java:1277)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1533)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at scala.collection.immutable.List$SerializationProxy.writeObject(List.scala:468)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at scala.collection.immutable.List$SerializationProxy.writeObject(List.scala:468)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1028)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)

0 个答案:

没有答案