增量添加到RDD生成的文件

时间:2015-05-12 17:04:44

标签: java apache-spark rdd

在Java中,我有一个表示图像中像素的二维数组。由于阵列的大小(9744×9744),我不能将整个阵列保存在一个RDD中。我决定一次处理一行图像的一半,然后使用saveAsTextFile()将其输出到文件中。当我这样做时,我会在处理完第一行后得到Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException

有没有办法逐步添加到第一行RDD的前半部分生成的文件?以下是我想要做的一个例子。

    int pixs[][] = new int[4872][2];
    int count = 0;
    int rowI = 0;
    int colJ = 0;
    int colJCurrent = 0;
    JavaRDD<int[]> firstHalf = null;
    sparkPixels = new double[ 1 ][ 9744 ][3];

    for( ; rowI < 1; rowI++ )
    {       
        for(int colCount=0;colCount < 2;colCount++)
        {
            for( colJ=colJCurrent; colJ < (colJCurrent+(rawWidth)); colJ++ )
            {
                pixs[count][0]=rowI;
                pixs[count][1]=colJ;
                count++;
            }
            colJCurrent=colJ;
            count=0;

                firstHalf = ctx.parallelize(Arrays.asList(pixs));
                JavaRDD<SimpleMatrix> firstResults = firstHalf.map(new Function<int[], SimpleMatrix>() {
                    private static final long serialVersionUID = 1L;

                    public SimpleMatrix call(int pix[])
                    {
                        return PixelInfoFunction(pix[0], pix[1]);
                    }
                });

                JavaRDD<String> stringOutput = firstResults.map(new Function<SimpleMatrix, String>() {
                private static final long serialVersionUID = 1L;

                public String call(SimpleMatrix i)
                {
                    return PixelInfoInStringFormatt;
                }
            });

            stringOutput.saveAsTextFile("/home/bielasjj/Projection_Output/test");   
        }
        colJ=0;
        count = 0;
        colJCurrent=0;

    }

    ctx.stop();
    ctx.close();

我已修改代码以一次处理一行并将其添加到数组以供稍后输出,但在运行时我得到Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded

for(int rowI = 0 ; rowI < 9744; rowI++ )
    {
        for(int colJ = 0 ; colJ < 9744; colJ++ )
        {
            pixs[colJ][0]=rowI;
            pixs[colJ][1]=colJ;
        }

        JavaRDD<int[]> firstHalf = ctx.parallelize(Arrays.asList(pixs));
        JavaRDD<SimpleMatrix> rowResults = firstHalf.map(new Function<int[], SimpleMatrix>() {
            private static final long serialVersionUID = 1L;

            public SimpleMatrix call(int pix[])
            {
                return PixelInfoFunction(pix[0], pix[1]);
            }
        });

        rowResults.foreach(new VoidFunction<SimpleMatrix>(){
            private static final long serialVersionUID = 1L;

            public void call(SimpleMatrix i)
            {
                sparkPixels[(int)i.get(3,0)][(int)i.get(4,0)][0] = i.get(0, 0);
                sparkPixels[(int)i.get(3,0)][(int)i.get(4,0)][1] = i.get(1, 0);
                sparkPixels[(int)i.get(3,0)][(int)i.get(4,0)][2] = i.get(2, 0);
            }
        });

    }
    PrintWriter csv = new PrintWriter("/home/CSV.csv");
    for(int i=0; i < (2 * rawWidth)-1; i++)
    {
        for(int j=0; j < (3 * rawHeight)-1; j++)
        {
            String line = i+ "," + j + ", " + (int)sparkPixels[i][j][0] +", "+(int)sparkPixels[i][j][1]+", "+(int)sparkPixels[i][j][2];
            csv.println(line);
        }
    }
    csv.close();

1 个答案:

答案 0 :(得分:0)

在输出文件名中添加行号:

stringOutput.saveAsTextFile("/home/bielasjj/Projection_Output/test" + rowI);