我正在使用JSOUP软件包来获取像facebook title这样的特定TITLE搜索。这是我的代码,它给输出带有TITLE' s。从TITLE中我想选择facebook URL。
计划:
package googlesearch;
import java.io.IOException;
import java.net.URLDecoder;
import java.net.URLEncoder;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class SearchRegexDiv {
private static String REGEX = ".?[facebook]";
public static void main(String[] args) throws IOException {
Pattern p = Pattern.compile(REGEX);
String google = "http://www.google.com/search?q=";
//String search = "stackoverflow";
String search = "hortonworks";
String charset = "UTF-8";
String userAgent = "ExampleBot 1.0 (+http://example.com/bot)"; // Change this to your company's name and bot homepage!
Elements links = Jsoup.connect(google + URLEncoder.encode(search, charset)).userAgent(userAgent).get().select(".g>.r>a");
for (Element link: links) {
String title = link.text();
String url = link.absUrl("href"); // Google returns URLs in format "http://www.google.com/url?q=<url>&sa=U&ei=<someKey>".
url = URLDecoder.decode(url.substring(url.indexOf('=') + 1, url.indexOf('&')), "UTF-8");
if (!url.startsWith("http")) {
continue; // Ads/news/etc.
}
//.?facebook
if (title.matches(REGEX)) {
System.out.println("Done");
title.substring(title.lastIndexOf(" ") + 1); //split the String
//(example.substring(example.lastIndexOf(" ") + 1));
}
System.out.println("Title: " + title);
System.out.println("URL: " + url);
}
}
}
输出:
Title: Hortonworks - Facebook logo
URL: https://www.facebook.com/hortonworks/
从输出中我得到上述格式的URL's
和TITLE's
列表。
我正在尝试匹配包含Facebook的标题,我想将其分成两个字符串,如
String socila_media = facebook;
String org = hortonworks;
答案 0 :(得分:0)
使用此代码使用多个字符
分割String
这是Demo To Split character using multiple param
String word = "https://www.facebook.com/hortonworks/";
String [] array = word.split("[/.]");
for (String each1 : array)
System.out.println(each1);
输出
https: //each splitted word in different line.
www
facebook
com
hortonworks