How to split a file containing JSON Array into multiple chunks?

时间:2018-01-23 19:38:27

标签: java json

I have a class as follows:

class MyClass {
  private String field1;
  private String field2;
  //getter and setter
}

And I have list of MyClass objects, Say List<MyClass> objects. Now I want to write these objects into a JSON file, something which looks like this:-

[
{
"field1": "abc1",
"field2": "xyz1"
},
{
"field1": "abc2",
"field2": "xyz2"
},
{
"field1": "abc3",
"field2": "xyz4"
},
//so on
]

Now, if the size of the file is more than 100KB, then I need to split this into multiple chunks (less number of chunks as possible) so that every chunk is just about less than 100KB and contains valid JSON.

Lets assume that above file exceeds 100KB then I need to split into multiple chunks as follows:

chunk1.json
    [
        {
        "field1": "abc1",
        "field2": "xyz1"
        },
        {
        "field1": "abc2",
        "field2": "xyz2"
        }
    ]

chunk2.json    
    [
        {
        "field1": "abc3",
        "field2": "xyz3"
        },
        //....
    ]

After that I can process the file one by one. How can I achieve this?

2 个答案:

答案 0 :(得分:1)

你可以这样做:

public static void main(String[] args) throws IOException {

    String summary = "";

    for(int a = 0; a<100000; a++) { 
        String current =  "\r\n" + new ObjectMapper().writeValueAsString(new MyClass());

        if((summary + current).getBytes("UTF-8").length>100000) {
            System.out.println("Overload 100 kb!");
             printFile(summary);
            summary = current;
        }
        else 
            summary = summary + "\r\n" + new ObjectMapper().writeValueAsString(new MyClass());
    } 
}

public static void printFile(String string) throws IOException {
    SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd_HH_mm_ss_SSS"); 
    try(  PrintWriter out = new PrintWriter("C:\\Chunks\\" +sdf.format(new Date()) +".txt" )  ){
        out.println( string );
    }
}

将UTF-8视为一种选择。 拆分发生时会计算字节[]长度,因此100000字节为100kb。

对于序列化,我使用了标准的Jackson方法,这是maven的依赖:

   <!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-databind -->
<dependency>
    <groupId>com.fasterxml.jackson.core</groupId>
    <artifactId>jackson-databind</artifactId>
    <version>2.9.3</version>
</dependency>

...最后是简单的Pojo类:

public class MyClass implements Serializable{


    /**
     * 
     */
    private static final long serialVersionUID = 1L;
    public String getField1() {
        return field1;
    }
    public void setField1(String field1) {
        this.field1 = field1;
    }
    public String getField2() {
        return field2;
    }
    public void setField2(String field2) {
        this.field2 = field2;
    }
    public MyClass() {
        super();
        // TODO Auto-generated constructor stub
    }
    private String field1;
    private String field2;
    //getter and setter
}

让我知道它是否有帮助!

答案 1 :(得分:0)

确切的实现可能取决于所使用的序列化框架(看看Jackson,它很容易使用),有些可能提供一个专门的流API,可以实现这样的事情。 在您描述的场景中,可以使用更简单的解决方案:

在for循环中序列化每个实例并在缓冲区中收集字符串(例如StringBuilder或直接转到OutputStream)。但是,在将String附加到此缓冲区之前,可以检查缓冲区的大小+新String的大小是否超过100kb。现在唯一丢失的部分是start([),separator(,)和end(])

的几个字符

(为了使大小计算正确,您可能必须将字符串转换为字节,因为像Äφς这样的字符将占用多个字节。也许这已经由您的JSON框架处理了)

你当然可以手动进行序列化,但特别是如果你的字符串可以包含引号(“)之类的字符,那么在JSON格式正确之前你最终可能会编写大量代码。

[edith:正如其他人所指出的,如果字符串长度超过100kb,则无法遵循规范]