Java Pattern.matches()因长String而失败

时间:2012-12-09 15:09:24

标签: java html regex

我正在尝试从HTML源代码中获取信息。当我只测试源的正确部分时,一切正常。但是当我测试整个源代码时,即使它与模式匹配,Pattern.matches()也会返回true。

不起作用:

    String response = " <!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\"     \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\"><html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"de\" lang=\"de\">  <head>      <meta http-equiv=\"Content-Type\" content=\"text/html; charset=ISO-8859-1\" />      <meta http-equiv=\"Content-Language\" content=\"de\" />     <meta name=\"description\" content=\"arenakampf.de - Die Herausforderung\" />       <meta name=\"keywords\" content=\"Arenakampf, AK, arenakampf.de, Arena, Gilden, Kampf\" />          <link rel=\"shortcut icon\" href=\"grafik/favicon.ico\" />              <link rel=\"stylesheet\" media=\"screen\" type=\"text/css\" href=\"css/reset.css\" />       <link rel=\"stylesheet\" media=\"screen\" type=\"text/css\" href=\"css/layout.css\" />      <link rel=\"stylesheet\" media=\"screen\" type=\"text/css\" href=\"css/content.css\" />     <link rel=\"stylesheet\" media=\"screen\" type=\"text/css\" href=\"css/jquery.ui.css\" />    <title>            Arenakampf - Die Herausforderung!    </title>       <script type=\"text/javascript\" src=\"js/jquery.js\"></script>     <script type=\"text/javascript\" src=\"js/jquery.ui.js\"></script>      <script type=\"text/javascript\" src=\"js/jquery.plugin.tablesorter.js\"></script>      <script type=\"text/javascript\" src=\"js/jquery.AK.functions.js\"></script>                                        <script type=\"text/javascript\">          var rTime = 0;      var pTime = 0;      var showTime = 0;      var name = \"Fredo\";      var d = new Date();      var rTime = Math.ceil(rTime);      var pTime = Math.ceil(pTime);      var startdate = Math.floor(d.getTime() / 1000);      var rTimeEnd = startdate + rTime;      var pTimeEnd = startdate + pTime;      var rTimerActive = window.setInterval(\"rTimer()\", 100);      var pTimerActive = 0;      var skyBannerAppear = 0;      var worldChat = 0;   </script>   <script type=\"text/javascript\" src=\"js/AK.WorldChat.js\"></script>     </head>  <body>       <div class=\"wrap\" >           <div class=\"root\">                <div class=\"top_header\">                  <a title=\"arenakampf.de\" href=\"?site=start\">                        <img src=\"grafik/background.top.jpg\" alt=\"Banner von Arenakampf.de\" />                  </a>                    <form method=\"post\" id=\"quickchange\" action=\"?site=overview\">                     <select name=\"quickchange\" onchange=\"javascript:document.getElementById('quickchange').submit()\"><option value=\"4339\">Aegis 100/100</option><option value=\"4340\" selected=\"selected\">Fredo 100/100</option><option value=\"4341\">Ymir 100/100</option><option value=\"4342\">Todos 100/100</option></select></form>                  <div class=\"information_header\">                      <div class=\"char_information\">                                                                                        Fredo<br />                             Zwerg, 7                                                        <div id=\"dek\">                                    <h2>Charakterinformationen</h2>                                 Regenerationszeit:  <span id=\"rcounter\">0:00</span><br />                                 Angriffsschutz: <span id=\"pcounter\">0:00</span><br />                                 Trefferpunkte: 106/106<br />                                                                                                Geld: 2n, 12k                               </div>                                                                                                  </div>                      <div class=\"message\">                         <a href=\"http://forum.arenakampf.de/showthread.php?p=66276#post66276\" target=\"_blank\"><font size=\"2\" color=\"#cccccc\"><b>Newsthread updated</b></font></a>                       </div>                      <div class=\"social_networks\">                         <a href=\"http://www.facebook.com/pages/Arenakampf/169687129766829?ref=hnav\" target=\"_blank\" title=\"ArenaKampf bei Facebook\">                              <img src=\"grafik/like_facebook.png\" alt=\"ArenaKampf bei Facebook\" />                            </a>                                <a href=\"https://plus.google.com/112562504705381850039?prsrc=3\" target=\"_blank\" title=\"ArenaKampf bei GooglePlus\">                                                                <img src=\"https://ssl.gstatic.com/images/icons/gplus-16.png\" alt=\"Arenakampf bei GooglePlus\" />                             </a>                            </div>                          <div class=\"online_counter\">                                           Online: 54,                            <a href=\"http://webchat.quakenet.org/?channels=arenakampf&uio=d4\" target=\"_blank\" title=\"Offizieller irc-Webchat von Arenakampf\">Q-Net: #arenakampf</a>                       </div>                  </div>                          </div>              <div class=\"main\">                    <div class=\"navigation\">                                          <div class=\"status_bars\">                         <div id=\"health\"></div>                           <div id=\"ers\"></div>                          <div id=\"exp\" style=\"width: 73%;\">                                                                              </div>                      </div>                          <script type=\"text/javascript\">          hpwidth = 100;          erswidth = 100;          document.getElementById('ers').style.width = erswidth + '%';          document.getElementById('health').style.width = hpwidth + '%';          if (106 != 0) {            setInterval( \"if (hpwidth + 0.13888888888889 * 20 <= 100) {hpwidth += 0.13888888888889 * 20;document.getElementById('health').style.width = hpwidth + '%'}\", 20000);            setInterval( \"if (erswidth + 0.27777777777778 * 20 <= 100) {erswidth += 0.27777777777778 * 20;document.getElementById('ers').style.width = erswidth + '%'}\", 20000);          }        </script>                       <ul >           <li>        <a href=\"?site=messages&pmpage=pmentry\">Nachrichten</a>           </li>       </ul>               <ul>            <li>                    <a href=\"?site=overview\">Übersicht</a>      </li>      <li>        <a href=\"?site=editcharstats\">Werte</a>      </li>            <li>        <a href=\"?site=skills\">Fertigkeiten</a>      </li>      <li>         <a href=\"?site=achievements\">Erfolge</a>      </li>       </ul>                                       <ul>            <li>            <a href=\"?site=itemshop\">Waffenladen</a>          </li>       <li>            <a href=\"?site=itemshop&cat=magical\">Magieladen</a><br />             </li>       <li>            <a href=\"?site=matshop\">Materialladen</a><br />           </li>      </ul>              <ul>              <li>                        <a href=\"?site=gilde\">Gilde</a>         </li>        <li>             <a href=\"?site=worldchat\">Weltchat</a>                        </li>       <li>        <a href=\"?site=handel\">Marktplatz</a>         </li>       <li>            <a href=\"?site=crafting\">Werkstatt</a>        </li>      </ul>            <ul>        <li>            <a href=\"?site=kalender\">Kalender</a>         </li>      </ul>            <ul>        <li>            <a href=\"?site=fight\">Arena</a>       </li>       <li>            <a href=\"?site=battlelogs\">Kampfbuch</a>          </li>       <li>                        <a href=\"?site=money\">Kassenbuch</a>          </li>       <li>            <a href=\"?site=adressbook\">Kontaktbuch</a>        </li>       </ul>        <h2 id=\"section_spiel\">Spiel</h2>    <ul class=\"section_spiel\">        <li>            <a href=\"http://forum.arenakampf.de/showthread.php?t=4288\" target=\"_blank\">Regeln</a>                   </li>       <li>            <a href=\"?site=intro\">Intro</a>               </li>       <li>            <a href=\"?site=changes\">News</a>                  </li>       <li><a href=\"?site=umfrage\">Umfrage</a></li><li><a href=\"?site=messages&pmpage=newpm&toname=Support\">Support</a></li>   </ul>       <ul>        <li>            <a href=\"?site=faq\">Anleitung</a>                 </li>       <li>            <a href=\"http://forum.arenakampf.de/showthread.php?t=5151\" target=\"_blank\">Wegweiser</a>                    </li>       <li>                                <a href=\"?site=ipcalculator\">IP-Berechnung</a>                                        </li>       <li>            <a href=\"http://forum.arenakampf.de\" target=\"_blank\">Forum</a>      </li>       <li>            <a href=\"?site=impressum\">Impressum</a>       </li>   </ul>   <h2>Rankings</h2>   <ul>        <li>            <a href=\"?site=craftrank\">Handwerker</a>                          </li>       <li>            <a href=\"?site=guildrank\">Gilden</a>                  </li>   </ul><ul>   <li>        <a href=\"?site=account\">Einstellungen</a> </li>   <li>        <a href=\"?site=editchars\">Charakter</a>           </li></ul>      <ul>        <li>            <a href=\"?site=logout\">LOGOUT</a>         </li>       </ul>           </div>                          <script type=\"text/javascript\">                           setBackgroundAttachment();          </script>                       <div class=\"content\">                                 <br /><br />                  <h2 class=\"player_name\">Fredo</h2><br /><br /><table class=\"tabletop\" width=\"85%\">  <tr>    <td width=\"55%\">      <table class=\"tabletop2\" width=\"100%\">        <tr><td colspan=\"2\">Account</td></tr>        <tr class=\"tablemid\"><td width=\"55%\">Name:</td>          <td width=\"45%\">eisfreak</td>        </tr>        <tr class=\"tablemid\"><td width=\"55%\">Ruhmpunkte:</td>          <td width=\"45%\">8</td>        </tr>        <tr><td></td></tr>        <tr><td colspan=\"2\">Charakter&uuml;bersicht:</td></tr>        <tr class=\"tablemid\"><td width=\"55%\">Name:</td><td width=\"45%\">Fredo</td></tr>        " +
            "<tr class=\"tablemid\"><td width=\"55%\">Stufe:</td><td width=\"45%\">7</td></tr>" +
            "        <tr class=\"tablemid\"><td width=\"55%\">Erfahrung:</td><td width=\"45%\">8899</td></tr>        <tr class=\"tablemid\"><td width=\"55%\">Erfahrung bis zur n&auml;chsten Stufe:</td><td width=\"45%\">554</td></tr>        <tr class=\"tablemid\"><td width=\"55%\">Geld:</td><td width=\"45%\"><small>2n, 12k</small></td></tr>        <tr class=\"tablemid\"><td width=\"55%\">Beruf:</td><td width=\"45%\">arbeitslos</td></tr>        <tr class=\"tablemid\"><td width=\"55%\">Berufserfahrung (Wert):</td><td width=\"45%\">0 <br /><small><i>(--- zu ---%)</i></small></td></tr>        <tr class=\"tablemid\"><td width=\"55%\">Ranglistenplatz:</td><td width=\"45%\">---</td></tr>        <tr><td></td></tr>        <tr><td colspan='2'>Statistik:</td></tr>        <tr class=\"tablemid\"><td width=\"55%\">K&auml;mpfe:</td><td width=\"45%\">93</td></tr>        <tr class=\"tablemid\"><td width=\"55%\">Davon gewonnen:</td><td width=\"45%\">86</td></tr>        <tr class=\"tablemid\"><td width=\"55%\">Davon verloren:</td><td width=\"45%\">7</td></tr>        <tr class=\"tablemid\"><td width=\"55%\">Verh&auml;ltnis:</td><td width=\"45%\">92%</td></tr>       </table>    </td>    <td align=\"center\" class=\"centered\"><img src=\"charpics/nopic.gif\" class=\"playerpic\" alt=\"Charbild\" /></td>  </tr>  <tr>    <td width=\"45%\">      <table class=\"tabletop2\" width=\"100%\">         <tr><td colspan='2'>Resistenzen:</td></tr>        <tr class=\"tablemid\"><td width=\"55%\">Hieb:</td><td width=\"45%\">10</td></tr>        <tr class=\"tablemid\"><td width=\"55%\">Schlag:</td><td width=\"45%\">0</td></tr>        <tr class=\"tablemid\"><td width=\"55%\">Stich:</td><td width=\"45%\">-10</td></tr>        <tr class=\"tablemid\"><td width=\"55%\">Feuer:</td><td width=\"45%\">-10</td></tr>        <tr class=\"tablemid\"><td width=\"55%\">Frost:</td><td width=\"45%\">-10</td></tr>        <tr class=\"tablemid\"><td width=\"55%\">Licht:</td><td width=\"45%\">-10</td></tr>        <tr class=\"tablemid\"><td width=\"55%\">Gift:</td><td width=\"45%...";

工作:

String response = "<tr class=\"tablemid\"><td width=\"55%\">Stufe:</td><td width=\"45%\">7</td></tr>";

正则表达式:

String attribute = "Stufe";         
String attributeValue;
if (source.matches("<tr class=\"tablemid\"><td width=\"55%\">" + attribute + ":</td><td width=\"45%\">(.*?)</td></tr>")){
    attributeValue = source.replaceAll("<tr class=\"tablemid\"><td width=\"55%\">" + attribute + ":</td><td width=\"45%\">(.*?)</td></tr>", "$1");
} else {
    attributeValue = "Error loading '" + attribute + "'";
}

有什么建议可以对付这个吗?字符串长度真的有问题吗?

2 个答案:

答案 0 :(得分:1)

除了分享Hovercraft的意见 - JSoup是如此邪恶 - 你没有正确使用匹配。匹配确保正则表达式匹配整个字符串,并不检查字符串是否包含正则表达式。我想你需要像

这样的东西
source.matches(".*<tr class=\"tablemid\"><td width=\"55%\">" + attribute + ":</td><td width=\"45%\">(.*?)</td></tr>.*"))

此。你必须仔细检查.*的贪婪。即使这样可行,正则表达式也需要大量的缓冲,这使得它非常慢。所以和Jsoup一起去吧,我保证你会很高兴!

答案 1 :(得分:0)

您还可以试用以下Html Parser

  

HTML Parser是一个用于以线性或嵌套方式解析HTML的Java库。主要   用于转换或提取,它具有过滤器,访问者,自定义标签和易于使用   JavaBeans的。它是一个快速,强大且经过良好测试的软件包。