在字符串中解析HTML

时间:2017-07-25 08:57:17

标签: java

我有许多带有html标记的字符串:

print "$hachage->{'id'}"

我,对于每个标记,我想要一个带有内容和标记的新字符串,在我的例子中它将是:

<font face='verdana'>A great project with many companies
A <b>plumber</b> company as external.
An <i>electricity</i> company.
And a security guard <u><i><b>during 2 weeks </b></i></u>.
</font>

可以删除第一个string1 = "A great project with many companies A" string2 = "<b>plumber</b>" string3 = " company as external. An" string4 = "<i>electricity</i>" string5 = " company. And a string6 = "<font color='FF6600'>security </font>" string7 = "guard" string8 = "<u><i><b>during 2 weeks </b></i></u>" string9 = "." string10 = "</font>" 和最后一个<font face='verdana'>

我尝试过matcher方法,但结果并不是我想要的,或者我的正则表达式并不好。 我还试图搜索第一个开始标记和第一个结束标记,但它没有用,因为所有字体标记(</font>和`都有相同的结束标记);

0 个答案:

没有答案