解析String - Http字符串

时间:2014-10-08 14:48:11

标签: java string parsing html-parsing

我想做这样的事情! 所以我只剩下字符串的网站部分。我在字符串中引用时遇到了问题。

     /////////////////////This is what i read into a string.

          ///<td width="118"><a href="research.html" class="navText style10 style12">

  ///////I wanna be able to parse this so i am only left with research.html

   //I sometimes also get a string that contains:

  //<a href="http://www.ucalgary.ca" class="style18"><font size="3">University of    Calgary</font></a></div>

     //From this string i wanna keep http://www.ucalgary.ca

到目前为止我所得到的并不总是适用于每一个案例。我很感激你的帮助!!我的代码是

        public class Parse
        {
          public static void main(String[] args)
          {
            String h = "<a href=\"http://www.departmentofmedicine.com/policy.htm\">";
            int n = getIndexOf(h, '"', 0);


            String[] a = h.substring(n).split(">");
            String url = a[0].replaceAll("\"", "");
            //String value = a[1].replaceAll("</a", "");

            System.out.println(url + "  " );
          }

          public static int getIndexOf(String str, char c, int n)
          {
            int pos = str.indexOf(c, 0);
            while (n-- > 0 && pos != -1)
            {
              pos = str.indexOf(c, pos + 1);
            }
            return pos;
          }
        }

2 个答案:

答案 0 :(得分:0)

我会像这样尝试Pattern和Matcher:

    String s = "<a href=\"http://www.departmentofmedicine.com/policy.htm\">";

    Pattern p = Pattern.compile(".*href=\"([^\"]*).*");
    Matcher m = p.matcher(s);
    if(m.matches()) {
        System.out.println(m.group(1));
    }

答案 1 :(得分:0)

小代码:

字符串h =“http://www.departmentofmedicine.com/policy.htm\">”;
String url = h.substring(h.indexOf(“http”))。replace(“\”&gt;“,”“);

的System.out.println(URL);

输出将是: http://www.departmentofmedicine.com/policy.htm

在我的机器上测试过。

同时发布可能的案例。所以我可以告诉你更好的解决方案。

解决所有三个问题:

        //String h1 = "<a href=\"http://www.departmentofmedicine.com/policy.htm\">";
        //String h1 = `"<a href=\"ucalgary.ca\"; class=\"style18\"><font size=\"3\">University of Calgary</font></a>";
    String h1="<td width=\"118\"><a href=\"research.html\" class=\"navText style10 style12\">";`

String url = h1.substring(h1.indexOf("href=\"") + "href=\"".length()).substring(0, h1.substring(h1.indexOf("href=\"") + "href=\"".length()).indexOf("\""));

System.out.println(url);

取消注释String h1;一个接一个地检查你的要求。

以上代码提供输出:
research.html
http://www.departmentofmedicine.com/policy.htm
ucalgary.ca