
时间:2013-09-07 13:09:41

标签: java dom4j

我正在遍历this webpage处的所有数据(下面的示例xml),我对如何获得所需的值感到困惑。

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet title="XSL_formatting" type="text/xsl" href="/i/xml/xsl_formatting_rss.xml"?>
<rss xmlns:blogChannel="http://backend.userland.com/blogChannelModule" version="2.0">
        <title>Ariana Resources News</title>
        <description />
        <title>Ariana Resources PLC - Environmental Impact Assessment Submitted for Kiziltepe</title>
        <description>Some Article information</description>
        <pubDate>Fri, 30 Aug 2013 07:00:00 GMT</pubDate>
        <title>Ariana Resources PLC - Directors' Dealings and Holding in Company</title>
        <description>Some Article information</description>
        <pubDate>Wed, 31 Jul 2013 07:00:00 GMT</pubDate>
        <title>Ariana Resources PLC - Directorship Changes</title>
        <description>Some Article information</description>
        <pubDate>Wed, 24 Jul 2013 09:31:00 GMT</pubDate>
        <title>Ariana Resources PLC - Ariana Resources plc : Capital Reorganisation</title>
        <description>Some Article information</description>
        <pubDate>Wed, 24 Jul 2013 09:31:00 GMT</pubDate>



  1. 如果它有今天的日期并通过每个...
  2. 获取每个特定的值和
  3. 此时我已经得到了以下内容,我认为在第二次循环中这是非常错误的...非常感谢任何帮助:

        //Create a null Document Object
        Document theXML = null;
        //Get the document of the XML and assign to Document object
        theXML = parseXML(url);
        //Place the root element of theXML into a variable
        Element root = theXML.getRootElement();
        // iterate through child elements of root
        for ( Iterator i = root.elementIterator(); i.hasNext(); ) {
            Element element = (Element) i.next();
            // do something
            // iterate through child elements of root with element name "item"
            for ( Iterator j = root.elementIterator( "item" ); j.hasNext(); ) {
                Element foo = (Element) j.next();
                String rnsHeadline = "";
                String rnsLink = "";
                String rnsFullText = "";
                String rnsConstituentName = "";
                Rns rns = new Rns(null, null, null, null);

2 个答案:

答案 0 :(得分:2)


// Place the root element of theXML into a variable
List<? extends Node> items =
        (List<? extends Node>)theXML.selectNodes("//rss/channel/item");

// RFC-dictated date format used with RSS
DateFormat dateFormatterRssPubDate =
        new SimpleDateFormat("EEE, dd MMM yyyy HH:mm:ss Z", Locale.ENGLISH);

// today started at this time
DateTime timeTodayStartedAt = new DateTime().withTimeAtStartOfDay();

for (Node node: items) {
     String pubDate = node.valueOf( "pubDate" );
     DateTime date = new DateTime(dateFormatterRssPubDate.parse(pubDate));
     if (date.isAfter(timeTodayStartedAt)) {
         // it's today, do something!
         System.out.println("Today: " + date);
     } else {
         System.out.println("Not today: " + date);

Dom4j需要jaxen依赖才能使XPath正常工作。我使用JodaTime来比较日期,因为它比使用java内置日期更清晰。 Here's the full example

请注意,dom4j并未真正维护,因此您可能也对this discussion about dom4j alternatives感兴趣。

答案 1 :(得分:0)


public class Dom4JRssParser {

    private void parse(Date day) throws DocumentException, ParseException {
        Date dayOnly = removeTime(day);

        // Fri, 30 Aug 2013 07:00:00 GMT
        SimpleDateFormat sdfXml = new SimpleDateFormat("EEE, dd MMM yyyy HH:mm:ss z", Locale.ENGLISH);
        System.out.println("Day: " + sdfXml.format(dayOnly));

        SAXReader reader = new SAXReader();
        Document doc = reader.read(getClass().getResourceAsStream("/com/so/dom4j/parser/rss/example_01.xml"));
        Element root = doc.getRootElement(); // rss
        for(Iterator rootIt = root.elementIterator("channel"); rootIt.hasNext(); ) {
            Element channel = (Element) rootIt.next();
            for(Iterator itemIt = channel.elementIterator("item"); itemIt.hasNext(); ) {
                Element item = (Element) itemIt.next();
                Element pubDate = item.element("pubDate");
                if(pubDate != null) {
                    if(removeTime(sdfXml.parse(pubDate.getTextTrim())).equals(dayOnly)) {
                        Rns rns = new Rns(item.element("title"), 

    private Date removeTime(Date day) {
        Calendar c = Calendar.getInstance(Locale.ENGLISH);
        c.set(Calendar.HOUR_OF_DAY, 0);
        c.set(Calendar.MINUTE, 0);
        c.set(Calendar.SECOND, 0);
        c.set(Calendar.MILLISECOND, 0);
        return c.getTime();

    public static void main(String... args) throws ParseException, DocumentException {
        Dom4JRssParser o = new Dom4JRssParser();
        if(args.length == 0) {
            o.parse(new Date());
        } else {
            SimpleDateFormat sdfInput = new SimpleDateFormat("yyyyMMdd");
            for(String arg : args) {




Day: Wed, 31 Jul 2013 00:00:00 CEST
Rns [rnsHeadline=Ariana Resources PLC - Directors' Dealings and Holding in Company
rnsFullText=Some Article information

另外,您可以考虑使用XPath API(您发布的快速入门链接中的Powerful Navigation with XPath部分),因为它更舒服,请参阅eis的回答。