Scala XML处理正在跳过一个值

时间:2015-04-09 17:30:27

标签: xml scala

我正在尝试在scala中开发一个rest api,它抓取几个rss feed的xml,然后在json中显示它们。到目前为止,我可以将它们显示为文本,这很好,但我无法让作者显示出来。我正在创建一个文章列表(其中Article是一个案例类),并搜索xml以提供Article类的值。

<item>
  <title>Chinese TV Star Apologizes For Remarks Critical Of Mao</title>
  <description>Bi Fujian, one of the country's most popular television presenters, recently ran afoul of his employer, state-run CCTV, for a parody song he performed at a private banquet.</description>
  <pubDate>Thu, 09 Apr 2015 12:51:15 -0400</pubDate>
  <link>http://www.npr.org/blogs/thetwo-way/2015/04/09/398534903/chinese-tv-star-apologizes-for-remarks-critical-of-mao?utm_medium=RSS&amp;utm_campaign=news</link>
  <guid>http://www.npr.org/blogs/thetwo-way/2015/04/09/398534903/chinese-tv-star-apologizes-for-remarks-critical-of-mao?utm_medium=RSS&amp;utm_campaign=news</guid>
  <content:encoded><![CDATA[<p>Bi Fujian, one of the country's most popular television presenters, recently ran afoul of his employer, state-run CCTV, for a parody song he performed at a private banquet.</p><p><a href="http://www.npr.org/templates/email/emailAFriend.php?storyId=398534903">&raquo; E-Mail This</a></p>]]></content:encoded>
  <dc:creator>Scott Neuman</dc:creator>
</item>

这是我正在解析的xml的一个例子。这是我用来解析它的代码:

def xml = XML.loadString(retrieveArticles("http://www.npr.org/rss/rss.php?id=1007")) ++ XML.loadString(retrieveArticles("http://www.npr.org/rss/rss.php?id=1003")) ++ XML.loadString(retrieveArticles("http://www.npr.org/rss/rss.php?id=1001"))

    val articles = (xml \\ "item").foldLeft(List[Article]())((ls,item) => Article((item \ "title").text,
        (item \ "dc:creator").text,
        (item \ "pubDate").text,
        (item \ "link").text,
        (item \ "description").text) :: ls)

正在正确处理所有其他值。作者是唯一没有出现的价值。当我打电话给api来展示文章时,这就是我得到的:

Title: Chinese TV Star Apologizes For Remarks Critical Of Mao,
Author: ,
Date Published: Thu, 09 Apr 2015 12:51:00 -0400,
Link: http://www.npr.org/blogs/thetwo-way/2015/04/09/398534903/chinese-   tv-star-apologizes-for-remarks-critical-of-mao?utm_medium=RSS&utm_campaign=news,
Contents: Bi Fujian, one of the country's most popular television presenters, recently ran afoul of his employer, state-run CCTV, for a parody song he performed at a private banquet.

为什么在显示所有其他值时没有显示作者?

1 个答案:

答案 0 :(得分:2)

XML中的冒号:是一个特殊字符,用于分隔标签与其(可选)前缀。因此,您要查找的元素的标签实际上是creator,而不是dc:creator。阅读XML here中的前缀。

如果您需要使用前缀和标签来选择元素,则可以使用prefix属性。以下是您遇到的问题的简化版本:

val xml = <root><foo:bar/><qux:bar/></root>
xml \\ "foo:bar" // No elements found!  This is the wrong selector.
xml \\ "bar" // NodeSeq(<foo:bar/>, <qux:bar/>)
(xml \\ "bar").filter(_.prefix == "foo") //NodeSeq(<foo:bar/>)

因此,在您的示例中,您只是想为作者使用(item \ "creator"),或者在必要时过滤到dc前缀。

作为旁注,您可以在代码中使用map代替foldLeft,这样会更整洁,更具惯用性:

(xml \\ "item").map { item => Article(
    (item \ "title").text,
    (item \ "creator").text,
    (item \ "pubDate").text,
    (item \ "link").text,
    (item \ "description").text
)}