我如何用JSoup解析这个(开放替代品)

时间:2014-02-26 03:51:06

标签: java web-scraping jsoup

我对JSoup比较陌生。我正在尝试解析从这些行的网站上删除的HTML

.....
    <FONT COLOR=#2D8F26 FACE="Arial"><B>Claim:</B></FONT> &nbsp; Photograph shows a Chicago Bears fan holding a crude sign at the <NOBR>2006-07</NOBR> <NOBR>NFC championship</NOBR> game.
    <BR><BR>
    <NOINDEX>
    <FONT COLOR=#2D8F26 FACE="Arial"><B>Status:</B></FONT> &nbsp; <FONT COLOR=#FF0000 FACE="Arial"><B><I>True.</I></B></FONT>
    </NOINDEX>
    <BR><BR>
    <FONT COLOR=#2D8F26 FACE="Arial"><B>Example:</B></FONT> &nbsp; <FONT COLOR=#2D8F26 FACE="Trebuchet MS,Bookman Old Style,Arial"><I>[Collected via e-mail, January 2007]</I></FONT>
    <BR><BR>
    <TABLE WIDTH=400 ALIGN=CENTER BORDER=0 BGCOLOR=#000000><TR><TD BGCOLOR=#EAF2E5>
    <FONT FACE="Verdana" SIZE=2">
    <DIV STYLE="text-align: justify; margin-top: 10px; margin-bottom: 10px; margin-left: 15px; margin-right: 15px">
    The attached photo has been circulating around the Gulf Coast region for a couple of days now (since Saturday's Bears-Saints game). Do you have any word on whether it is authentic or doctored? Was this individual really that tasteless and crude?
    <BR><BR>
    <CENTER>
......

我希望按照

的顺序生成输出
Claim :Photograph shows a Chicago Bears fan holding a crude sign at the 2006-07 NFC championship game.
Status:True.
Example:The attached photo has been circulating around the Gulf Coast region for a couple of days now (since Saturday's Bears-Saints game). Do you have any word on whether it is authentic or doctored? Was this individual really that tasteless and crude?

查看JSoup文档后,它显示了基于标记获取信息的方法。但是如何使用JSoup获得所需的输出?任何样品或样品的替代品将不胜感激。

1 个答案:

答案 0 :(得分:3)

我认为你只想通过剥离HTML实体来获取文本部分。应该工作

Jsoup.parse("yoursInputString").text();