java regex从脚本标记中的数据中提取图像src

时间:2017-05-31 04:35:19

标签: java

我需要一个java正则表达式来提取以下代码中脚本标记中的图像src。帮我解决.. 感谢

<script language="javascript"><!--
            document.write('<a href="javascript:popupWindow(\'https://www.kitchenniche.ca/prepara-adjustable-oil-pourer-pi-5597.html?invis=0\')">
<img src="images/imagecache/prepara-adjustable-oil-pourer-1.jpg" border="0" alt="Prepara Adjustable Oil Pourer" title=" Prepara Adjustable Oil Pourer " width="170" height="175" hspace="5" vspace="5">
<br>
</a>');
--></script>

2 个答案:

答案 0 :(得分:0)

试试这个:

String mydata = "<script language='javascript'><!--document.write('<a href='javascript:popupWindow"
                + "(\'https://www.kitchenniche.ca/prepara-adjustable-oil-pourer-pi-5597.html?invis=0\')'><img "
                + "src='images/imagecache/prepara-adjustable-oil-pourer-1.jpg' border='0' alt='Prepara Adjustable Oil Pourer' "
                + "title=' Prepara Adjustable Oil Pourer ' width='170' height='175' hspace='5' vspace='5'><br></a>');</script>";
        Pattern pattern = Pattern.compile("src='(.*?)'");
        Matcher matcher = pattern.matcher(mydata);
        if (matcher.find()) {
            System.out.println(matcher.group(1));
        }

答案 1 :(得分:0)

只有src位于src之后,此正则表达式才会找到<img属性的内容。如果src不是img标记的第一个属性,那么您需要更复杂的正则表达式。

public static void main(String[] args) {

        String s = "<script language=\"javascript\"><!--\r\n"
                + "            document.write('<a href=\"javascript:popupWindow(\\'https://www.kitchenniche.ca/prepara-adjustable-oil-pourer-pi-5597.html?invis=0\\')\">\r\n"
                + "<img src=\"images/imagecache/prepara-adjustable-oil-pourer-1.jpg\" border=\"0\" alt=\"Prepara Adjustable Oil Pourer\" title=\" Prepara Adjustable Oil Pourer \" width=\"170\" height=\"175\" hspace=\"5\" vspace=\"5\">\r\n"
                + "<br>\r\n" + "</a>');\r\n" + "--></script>";

        Pattern pattern = Pattern.compile("<img src=\"([^\"]+)");
        Matcher matcher = pattern.matcher(s);
        while (matcher.find()) {
            String group = matcher.group(1);
            System.out.println(group);
        }
    }

([^\"]+)表示匹配除"之外的任何字符,并将匹配放入第1组。在java中,您必须转义"