使用不同的字符和模式分割字符串

时间:2016-03-17 00:18:24

标签: java string split

我有这个包含html代码的字符串:

String str ="<form action=''><span> First Name </span> <input type='text' id='fname' class='cls' size='40' required /> <span> [*]</span> <input type='submit' value='Submit' name='btn' /> <select name='slcEle' > <option value='opt'> Text</option> </select> <input type='radio' id='this'/> <button name='name' type='reset' value='val'> Text</button> <input type='range' min='0' max='100' name='grade'/> <button name='btnname' type='button'> Text</button>";

我想拆分它,以便每个html元素都是一个单独的字符串。输出可以是包含以下内容的数组:

  

[0] = <form action=''>
   [1] = <span> First Name </span>
   [2] = <input type='text' id='fname' class='cls' size='40' required />
   [3] = <span> [*] </span>
   [4] = <input type='submit' value='Submit' name='btn' />
   [5] = <select name='slcEle' >
   [6] = <option value='opt'> Text</option>
   [7] = </select>
   等等。

我无法使用分割功能,因为您看到每个字符串都有不同的字符和模式。

任何人都可以帮忙吗?

3 个答案:

答案 0 :(得分:1)

如果您想正确处理html,我建议您使用可以帮助您的特定库。我推荐Jsoup

http://jsoup.org/

你会发现你想要实现的容易千倍。

答案 1 :(得分:0)

  

我想拆分它,以便每个html元素都是一个单独的字符串。

你可以&#34;标记&#34;带分隔符的初始字符串然后将其拆分 在下面的示例代码中,我要求正则表达式忽略仅包含空白字符的文本。

示例代码

String str = "<form action=''><span> First Name </span> <input type='text' id='fname' class='cls' size='40' required /> <span> [*]</span> <input type='submit' value='Submit' name='btn' /> <select name='slcEle' > <option value='opt'> Text</option> </select> <input type='radio' id='this'/> <button name='name' type='reset' value='val'> Text</button> <input type='range' min='0' max='100' name='grade'/> <button name='btnname' type='button'> Text</button>";

final String DELIMITER = "<--->";
String[] separateStrings = str //
                           .replaceAll("(?!\\s+)(<[^>]+>|[^/<>]+)", "$1" + DELIMITER) //
                           .split(DELIMITER);

int len = separateStrings.length;
for (int i = 0; i < len; i++) {
    System.out.format("[%d] = %s\n", i, separateStrings[i]);
}

输出

[0] = <form action=''>
[1] = <span>
[2] =  First Name 
[3] = </span>
[4] =  <input type='text' id='fname' class='cls' size='40' required />
[5] =  <span>
[6] =  [*]
[7] = </span>
[8] =  <input type='submit' value='Submit' name='btn' />
[9] =  <select name='slcEle' >
[10] =  <option value='opt'>
[11] =  Text
[12] = </option>
[13] =  </select>
[14] =  <input type='radio' id='this'/>
[15] =  <button name='name' type='reset' value='val'>
[16] =  Text
[17] = </button>
[18] =  <input type='range' min='0' max='100' name='grade'/>
[19] =  <button name='btnname' type='button'>
[20] =  Text
[21] = </button>

答案 2 :(得分:0)

  

我想拆分它,以便每个html元素都是一个单独的字符串。

以下是仅使用split()方法的替代答案。 (即不需要分隔符)。请注意,使用此解决方案时,仅保留包含空白字符的文本。

示例代码

String str = "<form action=''><span> First Name </span> <input type='text' id='fname' class='cls' size='40' required /> <span> [*]</span> <input type='submit' value='Submit' name='btn' /> <select name='slcEle' > <option value='opt'> Text</option> </select> <input type='radio' id='this'/> <button name='name' type='reset' value='val'> Text</button> <input type='range' min='0' max='100' name='grade'/> <button name='btnname' type='button'> Text</button>";

String[] separateStrings = str.split("(?<=>)|(?=</)");

int len = separateStrings.length;
for (int i = 0; i < len; i++) {
    System.out.format("[%d] = %s\n", i, separateStrings[i]);
}

输出

[0] = <form action=''>
[1] = <span>
[2] =  First Name 
[3] = </span>
[4] =  <input type='text' id='fname' class='cls' size='40' required />
[5] =  <span>
[6] =  [*]
[7] = </span>
[8] =  <input type='submit' value='Submit' name='btn' />
[9] =  <select name='slcEle' >
[10] =  <option value='opt'>
[11] =  Text
[12] = </option>
[13] =  
[14] = </select>
[15] =  <input type='radio' id='this'/>
[16] =  <button name='name' type='reset' value='val'>
[17] =  Text
[18] = </button>
[19] =  <input type='range' min='0' max='100' name='grade'/>
[20] =  <button name='btnname' type='button'>
[21] =  Text
[22] = </button>