Question

我如何只从java中的字符串中提取img Url。

<img src="http://www.moneycontrol.com/news_image_files/2014/b/bull_16-9_356x200_200_0558.jpg" alt="It may be too early to give up on bull market in equities" title="It may be too early to give up on bull market in equities" border="0" width="75" height="75" align=" left" hspace="5"

Answer 1

试试这个

String allContent = "<you url content>";
String rawUrl = allContent.split("http:")[1];
String partURL = rawUrl.split(".jpg")[0];           
String finalURL = "http:"+partURL+".jpg";

这不是好方法但是它对你提供的例子来说非常好。

Answer 2

正则表达式怎么样？ .*?src="([^"]+)".*将捕获src中的所有内容。

.*?src="([^"]+)".*

Regular expression visualization

Debuggex Demo

对于HTML或XML，最好使用真正的解析器。如果你有一个非常有限和具体的输入，这可能就足够了。

Answer 3

如果您正在使用xmls，最好使用解析器而您只有字符串，那么您可以使用以下代码段：

class imgUrl
{
public static void main(String[] args) 
{
    String tag="<img src=\"http://www.moneycontrol.com/news_image_files/2014/b/bull_16-9_356x200_200_0558.jpg\" alt=\"It may be too early to give up on bull market in equities\" title=\"It may be too early to give up on bull market in equities\" border=\"0\" width=\"75\" height=\"75\" align=\" left\" hspace=\"5\"";
    String url=tag.substring(tag.indexOf("src=\"")+5, tag.indexOf("\"",tag.indexOf("src=\"")+5));
    System.out.println("Url is "+url);
}
}

Answer 4

请注意，src也可以是脚本的属性。如果要解析HTML源代码，请确保您没有元素。所以要小心这个元素。这个正则表达式提取你需要的东西：

(.*?)<img(.*?) src=\"(.[^\"]*)\"(.*)

从java中的字符串中提取一些字符

4 个答案: