如何使用java正则表达式拆分单词?

时间:2016-07-04 12:19:56

标签: java regex

我正在使用JSOUP软件包来获取像facebook title这样的特定TITLE搜索。这是我的代码,它给输出带有TITLE' s。从TITLE中我想选择facebook URL。

计划:

package googlesearch;

import java.io.IOException;
import java.net.URLDecoder;
import java.net.URLEncoder;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;

public class SearchRegexDiv {
  private static String REGEX = ".?[facebook]";
  public static void main(String[] args) throws IOException {

    Pattern p = Pattern.compile(REGEX);
    String google = "http://www.google.com/search?q=";
    //String search = "stackoverflow";
    String search = "hortonworks";
    String charset = "UTF-8";
    String userAgent = "ExampleBot 1.0 (+http://example.com/bot)"; // Change this to your company's name and bot homepage!

    Elements links = Jsoup.connect(google + URLEncoder.encode(search, charset)).userAgent(userAgent).get().select(".g>.r>a");

    for (Element link: links) {
      String title = link.text();
      String url = link.absUrl("href"); // Google returns URLs in format "http://www.google.com/url?q=<url>&sa=U&ei=<someKey>".
      url = URLDecoder.decode(url.substring(url.indexOf('=') + 1, url.indexOf('&')), "UTF-8");

      if (!url.startsWith("http")) {
        continue; // Ads/news/etc.
      }

      //.?facebook
      if (title.matches(REGEX)) {
        System.out.println("Done");
        title.substring(title.lastIndexOf(" ") + 1); //split the String  
        //(example.substring(example.lastIndexOf(" ") + 1));
      }
      System.out.println("Title: " + title);

      System.out.println("URL: " + url);
    }
  }
}

输出:

Title: Hortonworks - Facebook logo URL: https://www.facebook.com/hortonworks/

从输出中我得到上述格式的URL'sTITLE's列表。

我正在尝试匹配包含Facebook的标题,我想将其分成两个字符串,如

String socila_media = facebook;

String org = hortonworks;

1 个答案:

答案 0 :(得分:0)

使用此代码使用多个字符

分割String

这是Demo To Split character using multiple param

  String word = "https://www.facebook.com/hortonworks/";
       String [] array = word.split("[/.]");
      for (String each1 : array)
      System.out.println(each1);

输出

https:   //each splitted word in different line.
www
facebook
com
hortonworks