用罗马图书馆获取rss的图片网址

时间:2016-04-20 12:03:08

标签: java xml rss rome

我在下面有一个rss文件:

http://teamcity/app/rest/builds/?locator=property:(name:<name>,value:<value>),lookupLimit:1000

我想获得图片的网址 我使用罗马图书馆,但没有找到任何解决方案 如何在罗马图书馆的项目中获取图片的网址?

4 个答案:

答案 0 :(得分:4)

我为了获取图片标记,在以下内容上构建新的rss解析器:

public class NewRssParser extends RSS094Parser implements WireFeedParser {

public NewRssParser() {
    this("rss_2.0");
}

protected NewRssParser(String type) {
    super(type);
}

protected String getRSSVersion() {
    return "2.0";
}

protected boolean isHourFormat24(Element rssRoot) {
    return false;
}

protected Description parseItemDescription(Element rssRoot, Element eDesc) {
    Description desc = super.parseItemDescription(rssRoot, eDesc);
    desc.setType("text/html"); // change as per
                                // https://rome.dev.java.net/issues/show_bug.cgi?id=26
    return desc;
}

public boolean isMyType(Document document) {
    boolean ok;
    Element rssRoot = document.getRootElement();
    ok = rssRoot.getName().equals("rss");
    if (ok) {
        ok = false;
        Attribute version = rssRoot.getAttribute("version");
        if (version != null) {
            // At this point, as far ROME is concerned RSS 2.0, 2.00 and
            // 2.0.X are all the same, so let's use startsWith for leniency.
            ok = version.getValue().startsWith(getRSSVersion());
        }
    }
    return ok;
}

@Override
public Item parseItem(Element arg0, Element arg1) {
    Item item = super.parseItem(arg0, arg1);

    Element imageElement = arg1.getChild("image", getRSSNamespace());
    if (imageElement != null) {
        String imageUrl = imageElement.getText();

        Element urlElement = imageElement.getChild("url");
        imageUrl = urlElement != null ? urlElement.getText() : imageUrl;

        Enclosure enc = new Enclosure();
        enc.setType("image");
        enc.setUrl(imageUrl);

        item.getEnclosures().add(enc);
    }

    return item;
}

}

在类中覆盖parseItem方法并添加get image元素的代码并将图像的URL添加到Enclosures.

然后将以下行添加到 rome.properties 文件中:

  

WireFeedParser.classes = [packge name] .NewRssParser

示例:

  

WireFeedParser.classes = ir.armansoft.newscommunity.newsgathering.parser.impl.NewRssParser

答案 1 :(得分:1)

罗马不会提供<image>代码,因为它不属于它所在的名称空间。So the feed isn't valid

line 18, column 3: Undefined item element: image (29 occurrences) [help]
            <image>http://www.saipanews.com/media/image/%D8%AA%D9%88%D9%84%D9%8A%D8%A ...

如果图片标记为in a different namespace,则为:

<image:image>http://www.saipanews.com/media/image/%D8%AA%D9%88%D9%84%D9%8A%D8%AF/2.jpg</image:image>

你可以用这种方式获得标记:

for(SyndEntry entry : feed.getEntries()) {
    for (Element element : entry.getForeignMarkup()) {
        System.out.println("element: " + element.toString());
    }
}

结果将是

element: [Element: <image:image [Namespace: http://purl.org/rss/1.0/modules/image/]/>]

除非Feed被修复,否则目前似乎无法通过Rome库获取图片网址。

答案 2 :(得分:0)

我通过使用罗马解析Feed然后再次解析它来获取原始jdom文档解决了这个问题。然后我可以从Feed中获取项目元素并查找图像。有点hacky但比扩展RSS解析器等更容易。

Worksheets("Sheet1")

以下是获取jdom文档的类:

byte[] data = ... bytes for the feed ...
SyndFeedInput input = new SyndFeedInput()
input.allowDoctypes = true
SyndFeed sf = input.build(new XmlReader(new ByteArrayInputStream(data)))

Document doc = new MyWireFeedInput().getDocument(new XmlReader(new ByteArrayInputStream(data)))
Element channel = doc.rootElement.getChild("channel")
List<Element> items = channel ? channel.getChildren("item") : null

List<SyndEntry> entries = sf.entries
for (int i = 0; i < entries.size(); i++) {
    SyndEntry entry = entries[i]
    Element item = items ? items[i] : null
    if (item) {
        Element image = item.getChild("image")
        ... add it to enclosures or whatever ...
    }
}

答案 3 :(得分:0)

答案很简单。 首先使用Roam API获取syndContent。 找到阅读图像的代码和RSS中的所有内容

<%@ page import="com.rometools.rome.feed.synd.SyndFeed"%>
<%@ page import="com.rometools.rome.feed.synd.SyndEntry"%>
<%@ page import="com.rometools.rome.feed.synd.SyndContent"%>
<%@ page import="com.rometools.modules.mediarss.MediaEntryModule"%>
<%@ page import="com.rometools.rome.feed.module.Module"%>
<%@ page import="com.rometools.modules.mediarss.types.Thumbnail"%>
<%@ page import="java.util.Iterator"%>
<%@ page import="java.util.List"%>
<html>
<head>
<title>website</title>
<link href="/css/style.css" rel="stylesheet" type="text/css" />
</head>
<body>
    <h1>Home</h1>
    <%
        HttpSession session1=request.getSession(false);

        SyndFeed syndFeed11= (SyndFeed) session1.getAttribute("syndFeed");

    %>
    <h2><%=syndFeed11.getTitle()%></h2>
    <ul>
        <% 
           Iterator it = syndFeed11.getEntries().iterator();
           while (it.hasNext())
           {
              SyndEntry entry = (SyndEntry) it.next();
         %>
        <li><a href="<%=entry.getLink()%>"><%=entry.getTitle()%></a> <%

                List<SyndContent> syndContents=entry.getContents();
        System.out.println(syndContents.size());
                for(SyndContent syndContent:syndContents)
                {
                    System.out.println(syndContent.getMode());
                    System.out.println("This is content"+syndContent.getValue());
                    %>
                    //This is The STRING WHICH CONTAINS the link to the image apply regex expression to get SAMPLE_LINK out of "<img src"LINK">"
                    <%=syndContent.getValue() %>>
                    <%
                }
                //SyndContent syndContent=syndContents.get(0);



                for (Module module : entry.getModules()) {
            if (module instanceof MediaEntryModule) {
                MediaEntryModule media = (MediaEntryModule)module;
                for (Thumbnail thumb : media.getMetadata().getThumbnail()) {
                    %><img src="<%=thumb.getUrl() %>" />
            <%
                }
            }
        }
            %></li>
        <%  } %>
    </ul>
</body>
</html>

Bellow是Servlet类: -

package website.web;
import java.io.IOException;
import java.io.PrintWriter;
import java.net.URL;
import javax.servlet.RequestDispatcher;
import javax.servlet.ServletConfig;
import javax.servlet.ServletContext;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import javax.servlet.http.HttpSession;

import org.apache.log4j.Logger;

import com.rometools.rome.feed.synd.SyndFeed;
import com.rometools.rome.io.FeedException;
import com.rometools.rome.io.SyndFeedInput;
import com.rometools.rome.io.XmlReader;

public class HomeServlet extends HttpServlet {

/**
 * 
 */
private static final long serialVersionUID = 1L;
private Logger logger = Logger.getLogger(this.getClass());

@Override
public void init(ServletConfig config) throws ServletException {
    super.init(config);
}

@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp)
        throws ServletException, IOException {
    String rssUrl=(String)req.getAttribute("rss");
    logger.debug("Retrieving yahoo news feed");
    URL url = new URL("https://www.reddit.com/.rss");
    SyndFeedInput syndFeedInput = new SyndFeedInput();
    HttpSession session=req.getSession();
    SyndFeed syndFeed = null;
    XmlReader xmlReader = new XmlReader(url);
    try {
        syndFeed = syndFeedInput.build(xmlReader);
        System.out.println("Donr");

    } catch (IllegalArgumentException e) {
        logger.error("", e);
    } catch (FeedException e) {
        logger.error("", e);
    }
    logger.debug("Forwarding to home.jsp");
    req.setAttribute("syndFeed11", syndFeed);
    PrintWriter out = resp.getWriter();
    out.println("<h1>");
    out.println();
    session.setAttribute("syndFeed", syndFeed);
    out.println("</h1>");
    ServletContext context = getServletContext();
    RequestDispatcher dispatcher = context.getRequestDispatcher("/WEB-INF/jsp/home.jsp");
    dispatcher.forward(req,resp);

}
}