我正在尝试从HTML源代码中获取信息。当我只测试源的正确部分时,一切正常。但是当我测试整个源代码时,即使它与模式匹配,Pattern.matches()也会返回true。
不起作用:
String response = " <!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitional//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\"><html xmlns=\"http://www.w3.org/1999/xhtml\" xml:lang=\"de\" lang=\"de\"> <head> <meta http-equiv=\"Content-Type\" content=\"text/html; charset=ISO-8859-1\" /> <meta http-equiv=\"Content-Language\" content=\"de\" /> <meta name=\"description\" content=\"arenakampf.de - Die Herausforderung\" /> <meta name=\"keywords\" content=\"Arenakampf, AK, arenakampf.de, Arena, Gilden, Kampf\" /> <link rel=\"shortcut icon\" href=\"grafik/favicon.ico\" /> <link rel=\"stylesheet\" media=\"screen\" type=\"text/css\" href=\"css/reset.css\" /> <link rel=\"stylesheet\" media=\"screen\" type=\"text/css\" href=\"css/layout.css\" /> <link rel=\"stylesheet\" media=\"screen\" type=\"text/css\" href=\"css/content.css\" /> <link rel=\"stylesheet\" media=\"screen\" type=\"text/css\" href=\"css/jquery.ui.css\" /> <title> Arenakampf - Die Herausforderung! </title> <script type=\"text/javascript\" src=\"js/jquery.js\"></script> <script type=\"text/javascript\" src=\"js/jquery.ui.js\"></script> <script type=\"text/javascript\" src=\"js/jquery.plugin.tablesorter.js\"></script> <script type=\"text/javascript\" src=\"js/jquery.AK.functions.js\"></script> <script type=\"text/javascript\"> var rTime = 0; var pTime = 0; var showTime = 0; var name = \"Fredo\"; var d = new Date(); var rTime = Math.ceil(rTime); var pTime = Math.ceil(pTime); var startdate = Math.floor(d.getTime() / 1000); var rTimeEnd = startdate + rTime; var pTimeEnd = startdate + pTime; var rTimerActive = window.setInterval(\"rTimer()\", 100); var pTimerActive = 0; var skyBannerAppear = 0; var worldChat = 0; </script> <script type=\"text/javascript\" src=\"js/AK.WorldChat.js\"></script> </head> <body> <div class=\"wrap\" > <div class=\"root\"> <div class=\"top_header\"> <a title=\"arenakampf.de\" href=\"?site=start\"> <img src=\"grafik/background.top.jpg\" alt=\"Banner von Arenakampf.de\" /> </a> <form method=\"post\" id=\"quickchange\" action=\"?site=overview\"> <select name=\"quickchange\" onchange=\"javascript:document.getElementById('quickchange').submit()\"><option value=\"4339\">Aegis 100/100</option><option value=\"4340\" selected=\"selected\">Fredo 100/100</option><option value=\"4341\">Ymir 100/100</option><option value=\"4342\">Todos 100/100</option></select></form> <div class=\"information_header\"> <div class=\"char_information\"> Fredo<br /> Zwerg, 7 <div id=\"dek\"> <h2>Charakterinformationen</h2> Regenerationszeit: <span id=\"rcounter\">0:00</span><br /> Angriffsschutz: <span id=\"pcounter\">0:00</span><br /> Trefferpunkte: 106/106<br /> Geld: 2n, 12k </div> </div> <div class=\"message\"> <a href=\"http://forum.arenakampf.de/showthread.php?p=66276#post66276\" target=\"_blank\"><font size=\"2\" color=\"#cccccc\"><b>Newsthread updated</b></font></a> </div> <div class=\"social_networks\"> <a href=\"http://www.facebook.com/pages/Arenakampf/169687129766829?ref=hnav\" target=\"_blank\" title=\"ArenaKampf bei Facebook\"> <img src=\"grafik/like_facebook.png\" alt=\"ArenaKampf bei Facebook\" /> </a> <a href=\"https://plus.google.com/112562504705381850039?prsrc=3\" target=\"_blank\" title=\"ArenaKampf bei GooglePlus\"> <img src=\"https://ssl.gstatic.com/images/icons/gplus-16.png\" alt=\"Arenakampf bei GooglePlus\" /> </a> </div> <div class=\"online_counter\"> Online: 54, <a href=\"http://webchat.quakenet.org/?channels=arenakampf&uio=d4\" target=\"_blank\" title=\"Offizieller irc-Webchat von Arenakampf\">Q-Net: #arenakampf</a> </div> </div> </div> <div class=\"main\"> <div class=\"navigation\"> <div class=\"status_bars\"> <div id=\"health\"></div> <div id=\"ers\"></div> <div id=\"exp\" style=\"width: 73%;\"> </div> </div> <script type=\"text/javascript\"> hpwidth = 100; erswidth = 100; document.getElementById('ers').style.width = erswidth + '%'; document.getElementById('health').style.width = hpwidth + '%'; if (106 != 0) { setInterval( \"if (hpwidth + 0.13888888888889 * 20 <= 100) {hpwidth += 0.13888888888889 * 20;document.getElementById('health').style.width = hpwidth + '%'}\", 20000); setInterval( \"if (erswidth + 0.27777777777778 * 20 <= 100) {erswidth += 0.27777777777778 * 20;document.getElementById('ers').style.width = erswidth + '%'}\", 20000); } </script> <ul > <li> <a href=\"?site=messages&pmpage=pmentry\">Nachrichten</a> </li> </ul> <ul> <li> <a href=\"?site=overview\">Übersicht</a> </li> <li> <a href=\"?site=editcharstats\">Werte</a> </li> <li> <a href=\"?site=skills\">Fertigkeiten</a> </li> <li> <a href=\"?site=achievements\">Erfolge</a> </li> </ul> <ul> <li> <a href=\"?site=itemshop\">Waffenladen</a> </li> <li> <a href=\"?site=itemshop&cat=magical\">Magieladen</a><br /> </li> <li> <a href=\"?site=matshop\">Materialladen</a><br /> </li> </ul> <ul> <li> <a href=\"?site=gilde\">Gilde</a> </li> <li> <a href=\"?site=worldchat\">Weltchat</a> </li> <li> <a href=\"?site=handel\">Marktplatz</a> </li> <li> <a href=\"?site=crafting\">Werkstatt</a> </li> </ul> <ul> <li> <a href=\"?site=kalender\">Kalender</a> </li> </ul> <ul> <li> <a href=\"?site=fight\">Arena</a> </li> <li> <a href=\"?site=battlelogs\">Kampfbuch</a> </li> <li> <a href=\"?site=money\">Kassenbuch</a> </li> <li> <a href=\"?site=adressbook\">Kontaktbuch</a> </li> </ul> <h2 id=\"section_spiel\">Spiel</h2> <ul class=\"section_spiel\"> <li> <a href=\"http://forum.arenakampf.de/showthread.php?t=4288\" target=\"_blank\">Regeln</a> </li> <li> <a href=\"?site=intro\">Intro</a> </li> <li> <a href=\"?site=changes\">News</a> </li> <li><a href=\"?site=umfrage\">Umfrage</a></li><li><a href=\"?site=messages&pmpage=newpm&toname=Support\">Support</a></li> </ul> <ul> <li> <a href=\"?site=faq\">Anleitung</a> </li> <li> <a href=\"http://forum.arenakampf.de/showthread.php?t=5151\" target=\"_blank\">Wegweiser</a> </li> <li> <a href=\"?site=ipcalculator\">IP-Berechnung</a> </li> <li> <a href=\"http://forum.arenakampf.de\" target=\"_blank\">Forum</a> </li> <li> <a href=\"?site=impressum\">Impressum</a> </li> </ul> <h2>Rankings</h2> <ul> <li> <a href=\"?site=craftrank\">Handwerker</a> </li> <li> <a href=\"?site=guildrank\">Gilden</a> </li> </ul><ul> <li> <a href=\"?site=account\">Einstellungen</a> </li> <li> <a href=\"?site=editchars\">Charakter</a> </li></ul> <ul> <li> <a href=\"?site=logout\">LOGOUT</a> </li> </ul> </div> <script type=\"text/javascript\"> setBackgroundAttachment(); </script> <div class=\"content\"> <br /><br /> <h2 class=\"player_name\">Fredo</h2><br /><br /><table class=\"tabletop\" width=\"85%\"> <tr> <td width=\"55%\"> <table class=\"tabletop2\" width=\"100%\"> <tr><td colspan=\"2\">Account</td></tr> <tr class=\"tablemid\"><td width=\"55%\">Name:</td> <td width=\"45%\">eisfreak</td> </tr> <tr class=\"tablemid\"><td width=\"55%\">Ruhmpunkte:</td> <td width=\"45%\">8</td> </tr> <tr><td></td></tr> <tr><td colspan=\"2\">Charakterübersicht:</td></tr> <tr class=\"tablemid\"><td width=\"55%\">Name:</td><td width=\"45%\">Fredo</td></tr> " +
"<tr class=\"tablemid\"><td width=\"55%\">Stufe:</td><td width=\"45%\">7</td></tr>" +
" <tr class=\"tablemid\"><td width=\"55%\">Erfahrung:</td><td width=\"45%\">8899</td></tr> <tr class=\"tablemid\"><td width=\"55%\">Erfahrung bis zur nächsten Stufe:</td><td width=\"45%\">554</td></tr> <tr class=\"tablemid\"><td width=\"55%\">Geld:</td><td width=\"45%\"><small>2n, 12k</small></td></tr> <tr class=\"tablemid\"><td width=\"55%\">Beruf:</td><td width=\"45%\">arbeitslos</td></tr> <tr class=\"tablemid\"><td width=\"55%\">Berufserfahrung (Wert):</td><td width=\"45%\">0 <br /><small><i>(--- zu ---%)</i></small></td></tr> <tr class=\"tablemid\"><td width=\"55%\">Ranglistenplatz:</td><td width=\"45%\">---</td></tr> <tr><td></td></tr> <tr><td colspan='2'>Statistik:</td></tr> <tr class=\"tablemid\"><td width=\"55%\">Kämpfe:</td><td width=\"45%\">93</td></tr> <tr class=\"tablemid\"><td width=\"55%\">Davon gewonnen:</td><td width=\"45%\">86</td></tr> <tr class=\"tablemid\"><td width=\"55%\">Davon verloren:</td><td width=\"45%\">7</td></tr> <tr class=\"tablemid\"><td width=\"55%\">Verhältnis:</td><td width=\"45%\">92%</td></tr> </table> </td> <td align=\"center\" class=\"centered\"><img src=\"charpics/nopic.gif\" class=\"playerpic\" alt=\"Charbild\" /></td> </tr> <tr> <td width=\"45%\"> <table class=\"tabletop2\" width=\"100%\"> <tr><td colspan='2'>Resistenzen:</td></tr> <tr class=\"tablemid\"><td width=\"55%\">Hieb:</td><td width=\"45%\">10</td></tr> <tr class=\"tablemid\"><td width=\"55%\">Schlag:</td><td width=\"45%\">0</td></tr> <tr class=\"tablemid\"><td width=\"55%\">Stich:</td><td width=\"45%\">-10</td></tr> <tr class=\"tablemid\"><td width=\"55%\">Feuer:</td><td width=\"45%\">-10</td></tr> <tr class=\"tablemid\"><td width=\"55%\">Frost:</td><td width=\"45%\">-10</td></tr> <tr class=\"tablemid\"><td width=\"55%\">Licht:</td><td width=\"45%\">-10</td></tr> <tr class=\"tablemid\"><td width=\"55%\">Gift:</td><td width=\"45%...";
工作:
String response = "<tr class=\"tablemid\"><td width=\"55%\">Stufe:</td><td width=\"45%\">7</td></tr>";
正则表达式:
String attribute = "Stufe";
String attributeValue;
if (source.matches("<tr class=\"tablemid\"><td width=\"55%\">" + attribute + ":</td><td width=\"45%\">(.*?)</td></tr>")){
attributeValue = source.replaceAll("<tr class=\"tablemid\"><td width=\"55%\">" + attribute + ":</td><td width=\"45%\">(.*?)</td></tr>", "$1");
} else {
attributeValue = "Error loading '" + attribute + "'";
}
有什么建议可以对付这个吗?字符串长度真的有问题吗?
答案 0 :(得分:1)
除了分享Hovercraft的意见 - JSoup是如此邪恶 - 你没有正确使用匹配。匹配确保正则表达式匹配整个字符串,并不检查字符串是否包含正则表达式。我想你需要像
这样的东西source.matches(".*<tr class=\"tablemid\"><td width=\"55%\">" + attribute + ":</td><td width=\"45%\">(.*?)</td></tr>.*"))
此。你必须仔细检查.*
的贪婪。即使这样可行,正则表达式也需要大量的缓冲,这使得它非常慢。所以和Jsoup一起去吧,我保证你会很高兴!
答案 1 :(得分:0)
您还可以试用以下Html Parser。
HTML Parser是一个用于以线性或嵌套方式解析HTML的Java库。主要 用于转换或提取,它具有过滤器,访问者,自定义标签和易于使用 JavaBeans的。它是一个快速,强大且经过良好测试的软件包。