嵌套引号中的正则表达式提取文本[Java / Json]

时间:2017-10-31 06:06:20

标签: java json regex pattern-matching

以下文字之一,我想在引号内提取值,例如“hash”。 与哈希关联的值是从引号的开头到结尾,在这种情况下:

  

00000e96c46d15aeaaf9ef6f88a295a8f17207d4cd9ac074d2314680095befc854d5a00600602af2fe03a24b61566ca2d8a6b858b0af840309ae449316833923

我的模式是

Scanner s = new Scanner(new File(path.toString()));
Pattern pattern = Pattern.compile("\"hash\": \".*\"");
String nextMatch = s.findWithinHorizon(pattern, 0);

模式的解释:我看一下带有引号的任何地方的序列,然后是单词hash和另一个引号。然后“:”跟随+ 1空格。之后会出现多个文本,直到出现另一个引号。

可悲的是,这种模式不起作用,我不明白为什么。

  

{ “散列”:   “00000e96c46d15aeaaf9ef6f88a295a8f17207d4cd9ac074d2314680095befc854d5a00600602af2fe03a24b61566ca2d8a6b858b0af840309ae449316833923”   “block”:“{\”type \“:\”block \“,\”transactions \“:[],\”timestamp \“:   “2017-09-07T07:09:52.628676 \”,“奖励”:   \“d5075b5d43cf97b73bd6483488f1f6a648dc83add93a37bb0817b17331fd51d989e2cf9fd3c8c0206fb89b84cf9e151b7d2123e4f6d71c95868bdfe1f4aa6b9e754a51a8e04bd49f5eec1931840315bc42844b715250534612da5e5809bdb14c496ad1a2d4b00823b80aacb7023667ca6923088b438dc5053d5bbf29a61620b28afa5d52d325ed8aa073a7f3a37e675c6bdf2dad09b809c8f3c60206392764458effb2c512d072af0cc7ea96058e1e19eccc72072939d5d16409843151b55607715f7ea9eff911914be9c88f1e719ed5cc5e95737977feeedbbd96b9150ce5a54c491aa94eab58df129445d89c9f8937c598ba95380a42c22e06ed2f0da4959b331e99e25554c122a095b2520ba3dcff6585c8c07cc6da9d3ad7e71a0ade2c6704c7c27aca3337916794efc4fa1a6e9784bbce1173ee7b408ece86a8a37f84706ed8092c06bb914510a97edffdda55ec09141bbfdf5af7029aa82e5f7e7da1cb1781426fef33721b66e727ea7aef19fb5dea6edc3e16c6d7f08f04f5067dc9a2d0c01015c1af848a1fcd6c64eef039c9c5d8e737c0655a97b6bc876854a34ad94fcd29218524c6c7881bd1ae4a9279edc12f95720d8a010d9a4c7dd19a4415bed2687fb462d95da8436954b5fd82d92b98935650a1fd7fa215ba95e8b20d8594c50cb9a8bc683af32133c007bc0dff3edd36e0 c20688385891788de63a5adcbb \”,   “难度”:\“0 \”,\“nonce \”:   \ “feec6d57f31d8aee18889026e4e484d96de6b874013a1932018e809c60c45019033389671dcc2e3138a555705cec95e365d79d3e68a909efcf15d0d137770131 \”,   \ “父\”:   \ “00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 \”}”,   “type”:“block_hash”}

我的整个代码:

public class TryToStream {

    static String url = "SorryICantShowYouThatOne";
    static String charset = "UTF-8";


    public static void main(String[] args) throws IOException, ParseException {
        JSONParser parser = new JSONParser();

        URL getURL = new URL(url + "get?start_at=");
        int counter = 0;
        boolean inputAvail = true;
        //clear textfile
        PrintWriter pw = new PrintWriter("jsonFormatted.txt");


        URL tmpURL = new URL(url + "get?start_at=" + counter);
        URLConnection connection = tmpURL.openConnection();
        InputStream is = connection.getInputStream();
        JSONArray json = (JSONArray) parser.parse(new BufferedReader(new InputStreamReader(is)));
        //   FileOutputStream fos = new FileOutputStream(new File("output2.txt"), true);
        BufferedWriter bw = new BufferedWriter(new FileWriter("jsonFormattedStream.txt"));
        bw.write(json.toJSONString());
        bw.close();

        Iterator iter = json.iterator();
        boolean flagForTesting = true;
        BufferedWriter bw2 = new BufferedWriter(new FileWriter("jsonFormatted.txt"));
        Pattern pattern = Pattern.compile("\"hash\": \"(.*?)\"");

        while (iter.hasNext() && flagForTesting) {

            Matcher matcher = pattern.matcher(iter.next().toString());
            matcher.find();
            System.out.println(matcher.group(1));
            flagForTesting = false;
        }
        bw2.close();


        System.out.println("End");
    }
}

如果我尝试匹配建议的正则表达式,我就不会得到匹配。

iter.next()的结果:

  

{“block”:“{\”type \“:\”block \“,\”transactions \“:[],\”timestamp \“:   “2017-09-07T07:09:52.628676 \”,“奖励”:   \“d5075b5d43cf97b73bd6483488f1f6a648dc83add93a37bb0817b17331fd51d989e2cf9fd3c8c0206fb89b84cf9e151b7d2123e4f6d71c95868bdfe1f4aa6b9e754a51a8e04bd49f5eec1931840315bc42844b715250534612da5e5809bdb14c496ad1a2d4b00823b80aacb7023667ca6923088b438dc5053d5bbf29a61620b28afa5d52d325ed8aa073a7f3a37e675c6bdf2dad09b809c8f3c60206392764458effb2c512d072af0cc7ea96058e1e19eccc72072939d5d16409843151b55607715f7ea9eff911914be9c88f1e719ed5cc5e95737977feeedbbd96b9150ce5a54c491aa94eab58df129445d89c9f8937c598ba95380a42c22e06ed2f0da4959b331e99e25554c122a095b2520ba3dcff6585c8c07cc6da9d3ad7e71a0ade2c6704c7c27aca3337916794efc4fa1a6e9784bbce1173ee7b408ece86a8a37f84706ed8092c06bb914510a97edffdda55ec09141bbfdf5af7029aa82e5f7e7da1cb1781426fef33721b66e727ea7aef19fb5dea6edc3e16c6d7f08f04f5067dc9a2d0c01015c1af848a1fcd6c64eef039c9c5d8e737c0655a97b6bc876854a34ad94fcd29218524c6c7881bd1ae4a9279edc12f95720d8a010d9a4c7dd19a4415bed2687fb462d95da8436954b5fd82d92b98935650a1fd7fa215ba95e8b20d8594c50cb9a8bc683af32133c007bc0dff3edd36e0 c20688385891788de63a5adcbb \”,   “难度”:\“0 \”,\“nonce \”:   \ “feec6d57f31d8aee18889026e4e484d96de6b874013a1932018e809c60c45019033389671dcc2e3138a555705cec95e365d79d3e68a909efcf15d0d137770131 \”,   \ “父\”:   \ “00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 \”}”, “类型”: “block_hash”, “散列”: “00000e96c46d15aeaaf9ef6f88a295a8f17207d4cd9ac074d2314680095befc854d5a00600602af2fe03a24b61566ca2d8a6b858b0af840309ae449316833923”}

1 个答案:

答案 0 :(得分:1)

你的正则表达式几乎就在那里!

正则表达式的问题在于它会尝试匹配字符串中的所有内容,直到 last 引号。因此它将一直匹配到"block_hash"。你只需要告诉它与懒惰匹配,所以它会在遇到第一个引号时停止匹配。

"hash": ".*?" // notice the question mark!

现在这个正则表达式匹配:

"hash": "00000e96c46d15aeaaf9ef6f88a295a8f17207d4cd9ac074d2314680095befc854d5a00600602af2fe03a24b61566ca2d8a6b858b0af840309ae449316833923"

如果你想捕获引号内的东西,我建议你添加一个捕获组:

"hash": "(.*?)"

您可以像这样使用此正则表达式:

Pattern pattern = Pattern.compile("\"hash\": \"(.*?)\"");
Matcher matcher = pattern.matcher(yourString);
matcher.find();
System.out.println(matcher.group(1));