我有一个关于将json转换为csv的问题 - 尤其是内存问题(至少我认为它是一个)。我编写了一些应该处理这种情况的函数,它适用于小型json文件。对于大型json文件,JFrame会被卡住,并且几分钟内都没有发生(我在约5分钟后使用任务管理器杀死了该进程)。 源json文件大约有30.000行。
我要做的事情:
"actor" : "ObjectId("12345")
等应更正为"actor" : "12345"
到目前为止我所拥有的:
public void mongoExportAndSplitFilter() {
ReadFileAndSave reader = new ReadFileAndSave();
String jsonFilePath = this.converterView.sourceTextField.getText();
//String targetFilePath = this.converterView.targetTextField.getText();
File jsonFile = new File(jsonFilePath);
Scanner scanner = new Scanner(reader.readFileAndCorrectOutput(jsonFile));
int j = 0;
StringBuffer sb = new StringBuffer();
reader.readPartOfFileAndSave("src/main/resources", scanner, j, sb);
//System.out.println("STEP 1: INPUT FILE (" + jsonFilePath + ") HAS BEEN CORRECTED!");
//System.out.println("STEP 2: INPUT FILE (" + jsonFilePath + ") HAS BEEN SPLITTED WHILE PARSING!");
this.filterView.setVisible(false);
this.filterView.dispose();
this.filterFlag = 1;
}
/**
* Utility function to correct the MongoExport-JSON-Output.
*
* @param file The file which should be corrected.
* @return Returns the correct JSON-String.
*/
public String readFileAndCorrectOutput(File file) {
String jsonStringCorrected = "";
StringBuffer sb = new StringBuffer();
try {
Scanner scanner = new Scanner(file);
while (scanner.hasNext()) {
String next = scanner.next();
if (next.contains("ObjectId") || next.contains("ISODate")) {
Matcher m = Pattern.compile(this.regEx)
.matcher(next);
if (m.find()) {
next = next.replaceAll(this.regEx, this.innerString);
}
}
//jsonStringCorrected += next;
sb.append(next);
}
scanner.close();
jsonStringCorrected = sb.toString();
JSONObject jsonObject = new JSONObject(jsonStringCorrected);
jsonStringCorrected = jsonObject.toString(2);
} catch (FileNotFoundException ex) {
Logger.getLogger(ReadFileAndSave.class.getName()).log(Level.SEVERE, null, ex);
}
return jsonStringCorrected;
}
/*
* Utility-function to read a json file part by part and save the parts to a separate json file.
* @param scanner The scanner which contains the file and which returns the lines from the file.
* @param j The counter of the file. As the file should change whenever the counter changes.
* @return jsonString The content of the jsonString.
*/
public String readPartOfFileAndSave(String filepath, Scanner scanner, int j, StringBuffer sb) {
String jsonString = "";
int i = 0;
++j;
while (scanner.hasNext()) {
String token = scanner.next();
//jsonString += token;
sb.append(token);
if (token.contains("{")) {
i++;
}
if (token.contains("}")) {
i--;
}
if (i == 0) {
jsonString = sb.toString();
JSONObject jsonObject = new JSONObject(jsonString);
jsonString = jsonObject.toString(2);
saveFile(filepath, "actor", j, jsonString);
jsonString = readPartOfFileAndSave(filepath, scanner, j);
}
}
return "";
}
有谁知道如何解决这个问题?
修改
这是文件的片段(前3行):
{ "verb" : "access", "target" : { "id" : "5485a7050ac61b1339a4da0e", "inquiryPhase" : "Orientation", "displayName" : "Orientation", "objectType" : "phase" }, "generator" : { "id" : "5485a7050ac61b1339a4da09", "displayName" : "LochemC", "objectType" : "ils", "url" : "http://graasp.eu/spaces/5485a7050ac61b1339a4da09" }, "provider" : { "id" : "5485a7050ac61b1339a4da09", "inquiryPhase" : "ils", "displayName" : "LochemC", "objectType" : "ils", "url" : "http://graasp.eu/spaces/5485a7050ac61b1339a4da09" }, "object" : { "id" : "5485a7050ac61b1339a4da09", "displayName" : "LochemC", "objectType" : "ils" }, "actor" : { "id" : "Bas Kollöffel (UT)@5485a7050ac61b1339a4da09", "displayName" : "Bas Kollöffel (UT)", "objectType" : "person" }, "published" : "2014-12-08T13:40:45.409Z", "publishedClient" : "2014-12-08T13:40:45.409Z", "publishedServer" : { "$date" : 1418046045490 }, "_id" : { "$oid" : "5485aa5dc372cdbb21daea33" } }
{ "verb" : "access", "target" : { "id" : "5485a7050ac61b1339a4da13", "inquiryPhase" : "Conceptualisation", "displayName" : "Conceptualisation", "objectType" : "phase" }, "generator" : { "id" : "5485a7050ac61b1339a4da09", "displayName" : "LochemC", "objectType" : "ils", "url" : "http://graasp.eu/spaces/5485a7050ac61b1339a4da09" }, "provider" : { "id" : "5485a7050ac61b1339a4da09", "inquiryPhase" : "ils", "displayName" : "LochemC", "objectType" : "ils", "url" : "http://graasp.eu/spaces/5485a7050ac61b1339a4da09" }, "object" : { "id" : "5485a7050ac61b1339a4da13", "inquiryPhase" : "Conceptualisation", "displayName" : "Conceptualisation", "objectType" : "phase" }, "actor" : { "id" : "Bas Kollöffel (UT)@5485a7050ac61b1339a4da09", "displayName" : "Bas Kollöffel (UT)", "objectType" : "person" }, "published" : "2014-12-08T13:40:46.867Z", "publishedClient" : "2014-12-08T13:40:46.867Z", "publishedServer" : { "$date" : 1418046046952 }, "_id" : { "$oid" : "5485aa5ec372cdbb21daea34" } }
{ "verb" : "access", "target" : { "id" : "5485a7050ac61b1339a4da1e", "inquiryPhase" : "Investigation", "displayName" : "Investigation", "objectType" : "phase" }, "generator" : { "id" : "5485a7050ac61b1339a4da09", "displayName" : "LochemC", "objectType" : "ils", "url" : "http://graasp.eu/spaces/5485a7050ac61b1339a4da09" }, "provider" : { "id" : "5485a7050ac61b1339a4da09", "inquiryPhase" : "ils", "displayName" : "LochemC", "objectType" : "ils", "url" : "http://graasp.eu/spaces/5485a7050ac61b1339a4da09" }, "object" : { "id" : "5485a7050ac61b1339a4da1e", "inquiryPhase" : "Investigation", "displayName" : "Investigation", "objectType" : "phase" }, "actor" : { "id" : "Bas Kollöffel (UT)@5485a7050ac61b1339a4da09", "displayName" : "Bas Kollöffel (UT)", "objectType" : "person" }, "published" : "2014-12-08T13:40:48.582Z", "publishedClient" : "2014-12-08T13:40:48.582Z", "publishedServer" : { "$date" : 1418046048662 }, "_id" : { "$oid" : "5485aa60c372cdbb21daea35" } }
答案 0 :(得分:0)
不要立即阅读整个文件。逐行阅读,进行更正,并在出发时写入输出。
此外,它看起来不像你需要在这里解析和重新创建json。应该能够在原始文本级别执行您需要的所有处理。
而且我也不认为你需要递归readPartOfFileAndSave()
,可以在外循环中做所有事情。