在一行中查找和替换多个URL

时间:2013-06-12 02:56:25

标签: java url url-shortener

所以我的目标是创建一个URL缩短器,它正在工作,除非我在一行中输入两个URL。

例如,如果我输入“laskjdflas www.google.com lakdsjfsa www.google.ca”,我会收到回复:

  

请输入缩短的网址

     

laskjdf www.google.ca lksadjf www.google.com

     

laskjdf http://aman207.tk/9 lksadjf http://aman207.tk/9

     

laskjdf htt://aman207.tk/-4gi5 lksadjf htt://aman207.tk/-4gi5

(我知道最后两个链接缺少p)

这是我的代码:

Scanner keyboard=new Scanner(System.in);
System.out.println("Please enter in a URL to shorten");
URLget=keyboard.nextLine();
String originalMessage=URLget;

Pattern p = Pattern.compile("(?i)\\b((?:https?://|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:'\".,<>?«»“”‘’]))");
Matcher m = p.matcher(URLget);
StringBuffer sb = new StringBuffer();
while (m.find())
{
   URLget=m.group(1);
   m.appendReplacement(sb, "");
   sb.append(URLget);
   m.appendTail(sb);
   String URL="http://www.aman207.tk/yourls-api.php?signature=0a88314b95&action=shorturl&url="+ URLget;
   if (URLget.startsWith("http://")||URLget.startsWith("www."))
   {
       try {
           DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
           DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
           Document doc = docBuilder.parse(new InputSource(new URL(URL).openStream()));

           NodeList nodeList = doc.getElementsByTagName("shorturl");

           for (int temp = 0; temp < nodeList.getLength(); temp++)
           {
               Node nNode = nodeList.item(temp);
               Element eElement = (Element) nNode;
               if(eElement.getAttribute("shorturl") != null)
               {
                   String findShortURL= eElement.getTextContent();
                   String finalMessage = originalMessage.replaceAll("(?:http://|www.?)[\\w/%.-]+", findShortURL);
                   System.out.println(finalMessage);
               }
            }
        }
    }
}

我需要它做什么,它可以替换一行上的每个URL。有人有什么建议吗?谢谢!

编辑:

输入: 随机字词[缩短网址(网址 1 )]更多随机字词[缩短网址(网址 2 )]

输出:

相同的随机单词[缩短的网址 1 ]相同的随机单词[缩短的网址 1 (这是与第一个网址相同的缩短网址。我需要它像预期的产出)]

预期产出:

相同的随机字词[缩短的网址 1 ]相同的随机字词[缩短的网址 2 ]

2 个答案:

答案 0 :(得分:1)

if语句替换为:

if(eElement.getAttribute("shorturl") != null)
{                      
    String findShortURL= eElement.getTextContent();
    originalMessage = originalMessage.replaceAll(URLget, findShortURL);
    System.out.println(originalMessage);
}

println循环之外使用for,让它只输出一次。

答案 1 :(得分:0)

我自己想出来了。

这是工作代码

Pattern p = Pattern.compile("(?i)\\b((?:https?://|www\\d{0,3}[.]|[a-z0-9.\\-]+[.][a-z]{2,4}/)(?:[^\\s()<>]+|\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\))+(?:\\(([^\\s()<>]+|(\\([^\\s()<>]+\\)))*\\)|[^\\s`!()\\[\\]{};:'\".,<>?«»“”‘’]))");
Matcher m = p.matcher(URLget);
StringBuffer sb = new StringBuffer();  
while (m.find())  
     {  
        URLget=m.group(1);  
        String URL="http://www.aman207.tk/yourls-api.php?signature=0a88314b95&action=shorturl&url="+ URLget;
        if (URLget.startsWith("http://")||URLget.startsWith("www."))
    {
        try {               
            DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
            DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
            Document doc = docBuilder.parse(new InputSource(new URL(URL).openStream()));

            NodeList nodeList = doc.getElementsByTagName("shorturl");

            for (int temp = 0; temp < nodeList.getLength(); temp++) {

                Node nNode = nodeList.item(temp);
                Element eElement = (Element) nNode;
                if(eElement.getAttribute("shorturl") != null)
                {
                    URLget=eElement.getTextContent();

                }
                else
                {

                }

            }

    }

       catch (IOException e) {
        e.printStackTrace();
        System.err.println("Error occured");
    }  catch (SAXException e) {
        System.err.println("You either entered in an invalid URL, or our URL shortener services are down. Please try again.");
    } catch (ParserConfigurationException e) {
        e.printStackTrace();
    }
    }
    else
    {

    }
    m.appendReplacement(sb, "");
    sb.append(URLget);

     }
    m.appendTail(sb);
    return (sb.toString());