Plist XPath查询与dict元素

时间:2014-10-17 07:12:19

标签: ruby xml xpath nokogiri

我正试图通过Nokigiri从一个plist的iTu​​nes库导出中加载歌曲名称:

doc = Nokogiri :: XML(open(file.path))

@songs = Array.new  
doc.xpath(<XPATH_HERE>).each do |n|
  @songs.push(n)  #append data to array
end 

plist的开头看起来像:

<plist version="1.0">
<dict>
  <key>Major Version</key><integer>1</integer>
  <key>Minor Version</key><integer>1</integer>
  <key>Date</key><date>2014-10-15T22:52:19Z</date>
  <key>Application Version</key><string>11.4</string>
  <key>Features</key><integer>5</integer>
  <key>Show Content Ratings</key><true/>
  <key>Music Folder</key><string>file://localhost/Users/mike/Music/iTunes/iTunes%20Media/</string>
  <key>Library Persistent ID</key><string>280B84572DDCF406</string>
  <key>Tracks</key>
  <dict>
    <key>96</key>
    <dict>
      <key>Track ID</key><integer>96</integer>
      <key>Name</key><string>Get Lucky (Daft Punk cover)</string>
      <key>Artist</key><string>Daughter</string>
      <key>Kind</key><string>MPEG audio file</string>
      <key>Size</key><integer>4716638</integer>
      <key>Total Time</key><integer>294112</integer>
      <key>Date Modified</key><date>2013-11-12T20:54:14Z</date>
      <key>Date Added</key><date>2013-12-18T17:56:09Z</date>
      <key>Bit Rate</key><integer>128</integer>
      <key>Sample Rate</key><integer>44100</integer>
      <key>Persistent ID</key><string>C3B1B6F26134C9C1</string>
      <key>Track Type</key><string>File</string>
      <key>Location</key><string>file://localhost/Users/mike/Music/iTunes/iTunes%20Media/Music/Daughter/Unknown%20Album/Get%20Lucky%20(Daft%20Punk%20cover).mp3</string>
      <key>File Folder Count</key><integer>5</integer>
      <key>Library Folder Count</key><integer>1</integer>
    </dict>
    <key>98</key>
    <dict>
      <key>Track ID</key><integer>98</integer>
      <key>Name</key><string>Swimming in Solace (DJ Fergie Ferg Remash)</string>
      <key>Kind</key><string>MPEG audio file</string>

我希望从每个曲目加载的是名称键后面的曲目名称字符串。我认为应该工作的XPath是

/plist/dict[key[. = 'Tracks']/following-sibling::*[1]]/dict[key/following-sibling::*[1]]/dict[key[. = 'Name']/following-sibling::*[1]]/string

XPath返回:

<string>Get Lucky (Daft Punk cover)</string>
<string>Daughter</string>
<string>MPEG audio file</string>
<string>C3B1B6F26134C9C1</string>
<string>File</string>
<string>file://localhost/Users/mike/Music/iTunes/iTunes%20Media/Music/Daughter/Unknown%20Album/Get%20Lucky%20(Daft%20Punk%20cover).mp3</string>
<string>Swimming in Solace (DJ Fergie Ferg Remash)</string>
<string>MPEG audio file</string>

似乎虽然我的XPath为每个字符串指定了密钥,但事实上它正在使用&#39;以下兄弟姐妹&#39;无论如何,每个dict都是如此。

如何使查询更具体,以便plist的这部分返回:

Get Lucky (Daft Punk cover)

Swimming in Solace (DJ Fergie Ferg Remash)

2 个答案:

答案 0 :(得分:3)

这是一个可能的XPath:

/plist/dict[key='Tracks']/dict/dict/key[.='Name']/following-sibling::string[1]

XPath的开头可能会有所不同,但我认为最重要的部分是最后2个路径步骤(key[.='Name']/following-sibling::string[1])。它告诉每个<string>元素后得到最接近的<key>Name</key>元素。

答案 1 :(得分:0)

我做这样的事情:

require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
    <plist version="1.0">
    <dict>
      <key>Major Version</key><integer>1</integer>
      <key>Minor Version</key><integer>1</integer>
      <key>Date</key><date>2014-10-15T22:52:19Z</date>
      <key>Application Version</key><string>11.4</string>
      <key>Features</key><integer>5</integer>
      <key>Show Content Ratings</key><true/>
      <key>Music Folder</key><string>file://localhost/Users/mike/Music/iTunes/iTunes%20Media/</string>
      <key>Library Persistent ID</key><string>280B84572DDCF406</string>
      <key>Tracks</key>
      <dict>
        <key>96</key>
        <dict>
          <key>Track ID</key><integer>96</integer>
          <key>Name</key><string>Get Lucky (Daft Punk cover)</string>
          <key>Artist</key><string>Daughter</string>
          <key>Kind</key><string>MPEG audio file</string>
          <key>Size</key><integer>4716638</integer>
          <key>Total Time</key><integer>294112</integer>
          <key>Date Modified</key><date>2013-11-12T20:54:14Z</date>
          <key>Date Added</key><date>2013-12-18T17:56:09Z</date>
          <key>Bit Rate</key><integer>128</integer>
          <key>Sample Rate</key><integer>44100</integer>
          <key>Persistent ID</key><string>C3B1B6F26134C9C1</string>
          <key>Track Type</key><string>File</string>
          <key>Location</key><string>file://localhost/Users/mike/Music/iTunes/iTunes%20Media/Music/Daughter/Unknown%20Album/Get%20Lucky%20(Daft%20Punk%20cover).mp3</string>
          <key>File Folder Count</key><integer>5</integer>
          <key>Library Folder Count</key><integer>1</integer>
        </dict>
        <key>98</key>
        <dict>
          <key>Track ID</key><integer>98</integer>
          <key>Name</key><string>Swimming in Solace (DJ Fergie Ferg Remash)</string>
          <key>Kind</key><string>MPEG audio file</string>
EOT

使用该设置,代码为:

doc.search('dict dict dict').map{ |d| d.at('./key[2]').next_sibling.text }
# => ["Get Lucky (Daft Punk cover)",
#     "Swimming in Solace (DJ Fergie Ferg Remash)"]

我更喜欢在可能的情况下使用CSS选择器,Nokogiri并不关心我们是否使用它们或XPath来对抗XML内容,因此使用search('dict dict dict')。然后,XPath可以方便地获取第n个元素,从而导致使用at('./key[2]')来获取<key>节点。 next_sibling然后返回下一个节点。

可以在纯XPath中完成它,但我发现它看起来像线噪声,并且更喜欢这种混合方法。纯XPath可能运行得更快,但我能够更快地保持自己的方式。