Scala XML处理正在跳过一个值

时间:2015-04-09 17:30:27

标签: xml scala

我正在尝试在scala中开发一个rest api,它抓取几个rss feed的xml,然后在json中显示它们。到目前为止,我可以将它们显示为文本,这很好,但我无法让作者显示出来。我正在创建一个文章列表(其中Article是一个案例类),并搜索xml以提供Article类的值。

  <title>Chinese TV Star Apologizes For Remarks Critical Of Mao</title>
  <description>Bi Fujian, one of the country's most popular television presenters, recently ran afoul of his employer, state-run CCTV, for a parody song he performed at a private banquet.</description>
  <pubDate>Thu, 09 Apr 2015 12:51:15 -0400</pubDate>
  <content:encoded><![CDATA[<p>Bi Fujian, one of the country's most popular television presenters, recently ran afoul of his employer, state-run CCTV, for a parody song he performed at a private banquet.</p><p><a href="">&raquo; E-Mail This</a></p>]]></content:encoded>
  <dc:creator>Scott Neuman</dc:creator>


def xml = XML.loadString(retrieveArticles("")) ++ XML.loadString(retrieveArticles("")) ++ XML.loadString(retrieveArticles(""))

    val articles = (xml \\ "item").foldLeft(List[Article]())((ls,item) => Article((item \ "title").text,
        (item \ "dc:creator").text,
        (item \ "pubDate").text,
        (item \ "link").text,
        (item \ "description").text) :: ls)


Title: Chinese TV Star Apologizes For Remarks Critical Of Mao,
Author: ,
Date Published: Thu, 09 Apr 2015 12:51:00 -0400,
Link:   tv-star-apologizes-for-remarks-critical-of-mao?utm_medium=RSS&utm_campaign=news,
Contents: Bi Fujian, one of the country's most popular television presenters, recently ran afoul of his employer, state-run CCTV, for a parody song he performed at a private banquet.


1 个答案:

答案 0 :(得分:2)

XML中的冒号:是一个特殊字符,用于分隔标签与其(可选)前缀。因此,您要查找的元素的标签实际上是creator,而不是dc:creator。阅读XML here中的前缀。


val xml = <root><foo:bar/><qux:bar/></root>
xml \\ "foo:bar" // No elements found!  This is the wrong selector.
xml \\ "bar" // NodeSeq(<foo:bar/>, <qux:bar/>)
(xml \\ "bar").filter(_.prefix == "foo") //NodeSeq(<foo:bar/>)

因此,在您的示例中,您只是想为作者使用(item \ "creator"),或者在必要时过滤到dc前缀。


(xml \\ "item").map { item => Article(
    (item \ "title").text,
    (item \ "creator").text,
    (item \ "pubDate").text,
    (item \ "link").text,
    (item \ "description").text