我从自定义结果中获得了以下结果。
{
"kind": "customsearch#search",
"url": {
"type": "application/json",
"template": "https://www.googleapis.com/customsearch/v1?q={searchTerms}& ={count?}& start={startIndex?}&lr={language?}&safe={safe?}&cx={cx?}&cref={cref?}&sort={sort?}&filter={filter?}&gl={gl?}&cr={cr?}&googlehost={googleHost?}&c2coff={disableCnTwTranslation?}&hq={hq?}&hl={hl?}&nsc={nsc?}&siteSearch={siteSearch?}&siteSearchFilter={siteSearchFilter?}&exactTerms={exactTerms?}&excludeTerms={excludeTerms?}&linkSite={linkSite?}&orTerms={orTerms?}&relatedSite={relatedSite?}&dateRestrict={dateRestrict?}&lowRange={lowRange?}&highRange={highRange?}&searchType={searchType}&fileType={fileType?}&rights={rights?}&imgSize={imgSize?}&imgType={imgType?}&imgColorType={imgColorType?}&imgDominantColor={imgDominantColor?}&alt=json"
},
"queries": {
"nextPage": [
{
"title": "Google Custom Search - flowers",
"totalResults": 10300000,
"searchTerms": "flowers",
"count": 10,
"startIndex": 11,
"inputEncoding": "utf8",
"outputEncoding": "utf8",
"cx": "013036536707430787589:_pqjad5hr1a"
}
],
"request": [
{
"title": "Google Custom Search - flowers",
"totalResults": 10300000,
"searchTerms": "flowers",
"count": 10,
"startIndex": 1,
"inputEncoding": "utf8",
"outputEncoding": "utf8",
"cx": "013036536707430787589:_pqjad5hr1a"
}
]
},
"context": {
"title": "Custom Search"
},
"items": [
{
"kind": "customsearch#result",
"title": "Flower - Wikipedia, the free encyclopedia",
"htmlTitle": "<b>Flower</b> - Wikipedia, the free encyclopedia",
"link": "http://en.wikipedia.org/wiki/Flower",
"displayLink": "en.wikipedia.org",
"snippet": "A flower, sometimes known as a bloom or blossom, is the reproductive structure found in flowering plants (plants of the division Magnoliophyta, ...",
"htmlSnippet": "A <b>flower</b>, sometimes known as a bloom or blossom, is the reproductive structure <br> found in flowering plants (plants of the division Magnoliophyta, <b>... </b>",
"pagemap": {
"RTO": [
{
"format": "image",
"group_impression_tag": "prbx_kr_rto_term_enc",
"Opt::max_rank_top": "0",
"Opt::threshold_override": "3",
"Opt::disallow_same_domain": "1",
"Output::title": "<b>Flower</b>",
"Output::want_title_on_right": "true",
"Output::num_lines1": "3",
"Output::text1": "꽃은 식물 에서 씨 를 만들어 번식 기능을 수행하는 생식 기관 을 말한다. 꽃을 형태학적으로 관찰하여 최초로 총괄한 사람은 식물계를 24강으로 분류한 린네 였다. 그 후 꽃은 식물분류학상중요한 기준이 되었다.",
"Output::gray1b": "- 위키백과",
"Output::no_clip1b": "true",
"UrlOutput::url2": "http://en.wikipedia.org/wiki/Flower",
"Output::link2": "위키백과 (영문)",
"Output::text2b": " ",
"UrlOutput::url2c": "http://ko.wikipedia.org/wiki/꽃",
"Output::link2c": "위키백과",
"result_group_header": "백과사전",
"Output::image_url": "http://www.gstatic.com/richsnippets/b/fcb6ee50e488743f.jpg",
"image_size": "80x80",
"Output::inline_image_width": "80",
"Output::inline_image_height": "80",
"Output::image_border": "1"
}
]
}
}
]
}
如何使用java从上面的代码中提取所有https链接?
答案 0 :(得分:2)
您可能很懒惰并忽略解析JSON,将整个结果视为字符串,只使用正则表达式来匹配URL。
String httpLinkPattern = "https?://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";
Pattern p = Pattern.compile(httpLinkPattern);
Matcher m = p.matcher(jsonResult);
while (m.find())
System.out.println("Found http link: "+m.group());
答案 1 :(得分:0)
如果您希望将响应转换为字符串进行操作,从而提取URL而不是使用JSON库,那么下面应该这样做。
public List<String> extractUrls(String input)
{
List<String> result = new ArrayList<String>();
Pattern pattern =
Pattern.compile("\\b(((ht|f)tp(s?)\\:\\/\\/|~\\/|\\/)|www.)" + "(\\w+:\\w+@)?(([-\\w]+\\.)+(com|org|net|gov"
+ "|mil|biz|info|mobi|name|aero|jobs|museum" + "|travel|[a-z]{2}))(:[\\d]{1,5})?"
+ "(((\\/([-\\w~!$+|.,=]|%[a-f\\d]{2})+)+|\\/)+|\\?|#)?" + "((\\?([-\\w~!$+|.,*:]|%[a-f\\d{2}])+=?"
+ "([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)" + "(&(?:[-\\w~!$+|.,*:]|%[a-f\\d{2}])+=?" + "([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)*)*"
+ "(#([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)?\\b");
Matcher matcher = pattern.matcher(input);
while (matcher.find())
{
result.add(matcher.group());
}
return result;
}
<强>用法:强>
List<String> links = extractUrls(jsonResponseString);
for (String link : links)
{
System.out.println(link);
}
答案 2 :(得分:0)
请使用JSON Parser执行此操作。我认为这将是最好的。请参考以下链接以获取很好的示例
答案 3 :(得分:0)
https?://.*?\.(org|com|net|gov)/.*?(?=")
此正则表达式适用于您的目的。 http://regexr.com?30nm2