我正在尝试重新格式化json文件并删除文件的大部分内容。这是原始的json文件。
"2597401":[{"jobID":"2597401",
"account":"TG-CCR120014",
"user":"charngda",
"pkgT":{"pgi/7.2- 5":{"libA":["libpgc.so"],
"flavor":["default"]}},
"startEpoch":"1338497979",
"runTime":"1022",
"execType":"user:binary",
"exec":"ft.D.64",
"numNodes":"4",
"sha1":"5a79879235aa31b6a46e73b43879428e2a175db5",
"execEpoch":1336766742,
"execModify":"Fri May 11 15:05:42 2012",
"startTime":"Thu May 31 15:59:39 2012",
"numCores":"64",
"sizeT":{"bss":"1881400168","text":"239574","data":"22504"}},
{"jobID":"2597401",
"account":"TG-CCR120014",
"user":"charngda",
"pkgT":{"pgi/7.2-5":{"libA":["libpgc.so"],
"flavor":["default"]}},
"startEpoch":"1338497946",
"runTime":"33" "execType":"user:binary",
"exec":"cg.C.64",
"numNodes":"4",
"sha1":"caf415e011e28b7e4e5b050fb61cbf71a62a9789",
"execEpoch":1336766735,
"execModify":"Fri May 11 15:05:35 2012",
"startTime":"Thu May 31 15:59:06 2012",
"numCores":"64",
"sizeT":{"bss":"29630984","text":"225749","data":"20360"}},
{"jobID":"2597401",
"account":"TG-CCR120014",
"user":"charngda",
"pkgT":{"pgi/7.2-5": {"libA":["libpgc.so"],
"flavor":["default"]}},
"startEpoch":"1338500447",
"runTime":"145",
"execType":"user:binary",
"exec":"mg.D.64",
"numNodes":"4",
"sha1":"173de32e1514ad097b1c051ec49c4eb240f2001f",
"execEpoch":1336766756,
"execModify":"Fri May 11 15:05:56 2012",
"startTime":"Thu May 31 16:40:47 2012",
"numCores":"64",
"sizeT":{"bss":"456954120","text":"426186","data":"22184"}},{"jobID":"2597401",
"account":"TG-CCR120014",
"user":"charngda",
"pkgT":{"pgi/7.2-5":{"libA":["libpgc.so"],
"flavor":["default"]}},
"startEpoch":"1338499002",
"runTime":"1444",
"execType":"user:binary",
"exec":"lu.D.64",
"numNodes":"4",
"sha1":"c6dc16d25c2f23d2a3321d4feed16ab7e10c2cc1",
"execEpoch":1336766748,
"execModify":"Fri May 11 15:05:48 2012",
"startTime":"Thu May 31 16:16:42 2012",
"numCores":"64",
"sizeT":{"bss":"199850984","text":"474218","data":"27064"}}],
对于每个JobId,我只想保留“exec”字段和JobID。如何构建正则表达式以使其余数据变得愚蠢?理想情况下,我想要以下内容:
JobID exec1 exec2 exec3
有没有办法做到这一点?
提前致谢。
答案 0 :(得分:2)
由于您没有指定您的RegEx引擎,我将假设您使用pcre作为我的答案。
基于JSON格式,您可以使用此RegEx匹配不需要的key-value对以替换为空:
/(,\s*(*SKIP))?+("(?!jobID"|exec)[^"]+"\s*+:\s*+("[^"]*"|{(?2)?+(?>,\s*(?2))*}|\[(?3)?+(?>,\s*(?3))*\]))(?(1)|,?)/g
以下是您在应用RegEx替换后订购的内容:
"2597401":[{"jobID":"2597401",
"execType":"user:binary",
"exec":"ft.D.64",
"execEpoch":1336766742,
"execModify":"Fri May 11 15:05:42 2012"},
{"jobID":"2597401" "execType":"user:binary",
"exec":"cg.C.64",
"execEpoch":1336766735,
"execModify":"Fri May 11 15:05:35 2012"},
{"jobID":"2597401",
"execType":"user:binary",
"exec":"mg.D.64",
"execEpoch":1336766756,
"execModify":"Fri May 11 15:05:56 2012"},{"jobID":"2597401",
"execType":"user:binary",
"exec":"lu.D.64",
"execEpoch":1336766748,
"execModify":"Fri May 11 15:05:48 2012"}],
正如您所看到的,结果字符串在“"jobID":"2597401" "execType":"user:binary"
”中的语法无效,这是您的给定数据中的语法错误...
解释:
/(,\s*(*SKIP))?+
# Attempts to match a comma and whitespace,
# without backtracking;
# And if the comma is matched, use (*SKIP) verb,
# which advances the pointer if we fail to match the comma.
# Key - Value pairs not worthy of keeping.
(
"(?!jobID"|exec)[^"]+" # Check if we like this key.
\s*+:\s*+ # The colon, advance whitespaces.
( # Check keys recursively.
"[^"]*"
# String literals, boring.
| {(?2)?+(?>,\s*(?2))*}
# Or: An object storing some key-value pairs
# we don't care about.
| \[(?3)?+(?>,\s*(?3))*\]
# Or: An array storing some values
# we don't care about.
)
)
(?(1)|,?)
# Balance the comma (so the result string is still valid JSON)
/gx
这是regex demo。