Question

如何在.java中使用Regex获取像facebook附加网址的页面元素（标题，描述，图像）

Answer 1

这是一个片段，它读取一个网页并构建一小部分HTML，它将显示Open Graph图像，右边的Title包围图像。如果缺少OG标签，它会回退到仅使用html标题，因此它可以表示所有网页。

public static String parsePageHeaderInfo(String urlStr) throws Exception {

    StringBuilder sb = new StringBuilder();
    Connection con = Jsoup.connect(urlStr);

    /* this browseragant thing is important to trick servers into sending us the LARGEST versions of the images */
    con.userAgent(Constants.BROWSER_USER_AGENT);
    Document doc = con.get();

    String text = null;
    Elements metaOgTitle = doc.select("meta[property=og:title]");
    if (metaOgTitle!=null) {
        text = metaOgTitle.attr("content");
    }
    else {
        text = doc.title();
    }

    String imageUrl = null;
    Elements metaOgImage = doc.select("meta[property=og:image]");
    if (metaOgImage!=null) {
        imageUrl = metaOgImage.attr("content");
    }

    if (imageUrl!=null) {
        sb.append("<img src='");
        sb.append(imageUrl);
        sb.append("' align='left' hspace='12' vspace='12' width='150px'>");
    }

    if (text!=null) {
        sb.append(text);
    }

    return sb.toString();       
}

Answer 2

正如Ishikawa Yoshi所说，使用JSoup

示例：

Document doc = Jsoup.connect("http://example.com/").get()
for(Element meta : doc.select("meta")) {
    System.out.println("Name: " + meta.attr("name") + " - Content: " + meta.attr("content"));
}

此代码未经测试，希望这会有所帮助。

使用RegEx来抓取文档是一个坏主意，请阅读on Coding Horror

Answer 3

这个怎么样？下面的语句解析所有标签以“og：”开头。这很有用。

doc.select（ “元[属性^ = OG：]”）

Double

Answer 4

我建议你这个链接jsoup.org如果你还没有解决问题，你可以看看here有一些例子如何用jsoup解决你的问题。
并here。

Answer 5

我使用JSOUP获取一个Document对象，然后使用类似下面的方法来获取我要查找的每个属性的标签。

String findTag(Document document, String property) {
    String tag = null;
    String cssQuery = "meta[property='og:" + property + "']";
    Elements elements = document.select(cssQuery);

    if (elements != null && elements.size() >= 1) {
        tag = elements.first().attr("content");
    }
    return tag;
}

我经常使用它，直到决定将提取（与JSOUP）和解析一起组合到一个名为ogmapper的库中。

如何在java中使用Regex获取像facebook这样的页面元（标题，描述，图像）附加URL

5 个答案: