在屏蔽PI数据时处理JsonArray的问题

时间:2017-11-14 09:57:17

标签: java json parsing bigdata

我有大约PB的电子邮件数据,我的json的结构如下:

[{
"EmailID": 1234567, 
"category": "Email", 
"tsubject": "Regarding Complain no. XXXXXX", 
"tto": "Email (abc@abc.com)", 
"tCc": "", "tBcc": null, 
"tfrom": "abc\abc@abc.in", 
"Email": "", 
"customer_emailID": "abc@abc.in",
"Customer_name": "abc", 
"mail_time": "2016-11-08 12:26:54.0"
},{
"EmailID": 1234567, 
"category": "Chat", 
"tsubject": "Regarding Complain no. XXXXXX", 
"tto": "Email (abc@abc.com)", 
"tCc": "", "tBcc": null, 
"tfrom": "abc\abc@abc.in", 
"Email": "", 
"customer_emailID": "abc@abc.in",
"Customer_name": "abc", 
"mail_time": "2016-11-08 12:26:54.0"
},...]

我使用以下代码来屏蔽我的json文件的个人信息(电子邮件数据):

    public class CUST_EMAIL{
    public static void main(String[] args) throws IOException {


            File dir = new File("D:\\CUst\\CUsto");
            File [] files  = dir.listFiles();
            Arrays.sort(files, new Comparator<Object>(){
            public int compare(Object o1, Object o2) {
                    return compare( (File)o1, (File)o2);
            }
            private int compare( File f1, File f2){
                    long result = f2.lastModified() - f1.lastModified();
                    if( result > 0 ){
                    return 1;
                    } else if( result < 0 ){
                    return -1;
                    } else {
                    return 0;
                    }
            }
            });
            try {
            System.out.println( Arrays.asList(files ));
            System.out.println(files[0]);
            }
            catch (ArrayIndexOutOfBoundsException exception) {
            System.out.println(exception);
            }
            FileInputStream inputStream = null;

    String dd = files[0].getName();
            System.out.println(dd);
            FileOutputStream fout=new FileOutputStream("D:\\CUst\\custmask"+dd);

            Scanner sc = null;
            try {
                inputStream = new FileInputStream(files[0]);
                sc = new Scanner(inputStream, "UTF-8");
                while (sc.hasNextLine()) {
                   String line = sc.nextLine();
                   System.out.println(line);

                   JSONObject obj=new JSONObject(line);

                   List<String> fieldNames = Arrays.asList("tto","tCc", "tBcc","customer_emailID", "Customer_name");


                    byte[] content= mask(obj, fieldNames, "777777").getBytes();
                    fout.write(content);
                    fout.write("\n".getBytes());
                    // System.out.println(mask(obj, fieldNames, "0000000"));
                }
                // note that Scanner suppresses exceptions
                if (sc.ioException() != null) {
                    throw sc.ioException();
                }
            }


            finally {
                    fout.close();
                if (inputStream != null) {
                    inputStream.close();
                }
                if (sc != null) {
                    sc.close();
                }
            }
     if (files[0].exists()){
                    files[0].delete();
             } else {
                    System.out.println("no file to delete");
            }

    }


    public static String mask(JSONObject object, List<String> fieldNames, String mask) {

            Set<String> columns=object.keySet();
            //Field[] fields = object.getClass().getDeclaredFields();
       for(String i:columns) {
               if(fieldNames.contains(i)) {
                       object.put(i, mask);
               }
       }
       return object.toString();
    }

}

当我运行程序时,我收到以下错误:

Exception in thread "main" org.json.JSONException: A JSONObject text must begin with '{' at 1 [character 2 line 1]
at org.json.JSONTokener.syntaxError(JSONTokener.java:451)
at org.json.JSONObject.<init>(JSONObject.java:196)
at org.json.JSONObject.<init>(JSONObject.java:320)
at com.TALISMA_EMAIL.main(TALISMA_EMAIL.java:58)

我认为因为[和]分别在开头和结尾,它无法理解jsonobject;它需要jsonarray,但我需要一个建议,如何使用我的代码处理这个电子邮件json结构。

**注意:**如果问题需要,请求您建议任何更改

1 个答案:

答案 0 :(得分:0)

删除while循环并将其替换为:

   StringBuilder sb = new StringBuilder();
     // loop over the lines to construct a stringbuilder 
     //containing the input
    while (sc.hasNextLine()) {
        String line = sc.nextLine();
        System.out.println(line);

        sb.append(line);

    }
       // then perform your processing 
    List<String> fieldNames = Arrays.asList("tto", "tCc", "tBcc", "customer_emailID", "Customer_name");
    try {
        JSONArray arr = new JSONArray(sb.toString());

        for (int i = 0; i < arr.length(); i++) {

            JSONObject obj = arr.getJSONObject(i);

            byte[] content = mask(obj, fieldNames, "777777").getBytes();
            fout.write(content);
            fout.write("\n".getBytes());

        }
    } catch (JSONException e) {
        // handle the exception
    }