我知道它再次成为一个非常棒的问题,但我现在已经在互联网上磕磕绊绊几天了,无法解决我的问题。我已经从discogs下载了数据转储,这是一个大约35 GB的xml文件。到目前为止,我必须使用SAX-Parser,因为我显然无法将此文件加载到我的RAM中,并且那只牛在红宝石中获得了最好的运行时间,但我只是不喜欢'了解如何使用这个解析器,即使是使用小型IO-Objects或只是为了测试它仍然是一件神奇的事情,把东西扔回给我,我不明白。这就是xml的样子:
<releases>
<release id="1" status="Accepted"><images><image height="600" type="primary" uri="" uri150="" width="600"/><image height="600" type="secondary" uri="" uri150="" width="600"/><image height="600" type="secondary" uri="" uri150="" width="600"/><image height="600" type="secondary" uri="" uri150="" width="600"/></images><artists><artist><id>1</id><name>The Persuader</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists><title>Stockholm</title><labels><label catno="SK032" id="5" name="Svek"/></labels><extraartists><artist><id>239</id><name>Jesper Dahlbäck</name><anv></anv><join></join><role>Music By [All Tracks By]</role><tracks></tracks></artist></extraartists><formats><format name="Vinyl" qty="2" text=""><descriptions><description>12"</description><description>33 ⅓ RPM</description></descriptions></format></formats><genres><genre>Electronic</genre></genres><styles><style>Deep House</style></styles><country>Sweden</country><released>1999-03-00</released><notes>The song titles are the names of six of Stockholm's 82 districts.
Title on label: - Stockholm -
Recorded at the Globe Studio, Stockholm
FAX: +46 8 679 64 53
</notes><data_quality>Needs Vote</data_quality><tracklist><track><position>A</position><title>Östermalm</title><duration>4:45</duration></track><track><position>B1</position><title>Vasastaden</title><duration>6:11</duration></track><track><position>B2</position><title>Kungsholmen</title><duration>2:49</duration></track><track><position>C1</position><title>Södermalm</title><duration>5:38</duration></track><track><position>C2</position><title>Norrmalm</title><duration>4:52</duration></track><track><position>D</position><title>Gamla Stan</title><duration>5:16</duration></track></tracklist><identifiers><identifier description="A-Side Runout" type="Matrix / Runout" value="MPO SK 032 A1"/><identifier description="B-Side Runout" type="Matrix / Runout" value="MPO SK 032 B1"/><identifier description="C-Side Runout" type="Matrix / Runout" value="MPO SK 032 C1"/><identifier description="D-Side Runout" type="Matrix / Runout" value="MPO SK 032 D1"/><identifier description="Only On A-Side Runout" type="Matrix / Runout" value="G PHRUPMASTERGENERAL T27 LONDON"/></identifiers><videos><video duration="326" embed="true" src="https://www.youtube.com/watch?v=afMHNll9EVM"><title>The Persuader - Gamla Stan</title><description>The Persuader - Gamla Stan</description></video><video duration="301" embed="true" src="https://www.youtube.com/watch?v=EBBHR3EMN50"><title>The Persuader - Norrmalm</title><description>The Persuader - Norrmalm</description></video><video duration="341" embed="true" src="https://www.youtube.com/watch?v=WDZqiENap_U"><title>The Persuader - Södermalm</title><description>The Persuader - Södermalm</description></video><video duration="176" embed="true" src="https://www.youtube.com/watch?v=XExCZfMCXdo"><title>The Persuader - Kungsholmen</title><description>The Persuader - Kungsholmen</description></video><video duration="376" embed="true" src="https://www.youtube.com/watch?v=Cawyll0pOI4"><title>The Persuader - Vasastaden</title><description>The Persuader - Vasastaden</description></video><video duration="296" embed="true" src="https://www.youtube.com/watch?v=MpmbntGDyNE"><title>The Persuader - Östermalm</title><description>The Persuader - Östermalm</description></video></videos><companies><company><id>271046</id><name>The Globe Studios</name><catno></catno><entity_type>23</entity_type><entity_type_name>Recorded At</entity_type_name><resource_url>https://api.discogs.com/labels/271046</resource_url></company><company><id>56025</id><name>MPO</name><catno></catno><entity_type>17</entity_type><entity_type_name>Pressed By</entity_type_name><resource_url>https://api.discogs.com/labels/56025</resource_url></company></companies></release>
<release id="2" status="Accepted"><images><image height="394" type="primary" uri="" uri150="" width="400"/><image height="600" type="secondary" uri="" uri150="" width="600"/><image height="600" type="secondary" uri="" uri150="" width="600"/></images><artists><artist><id>2</id><name>Mr. James Barth & A.D.</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists><title>Knockin' Boots Vol 2 Of 2</title><labels><label catno="SK 026" id="5" name="Svek"/><label catno="SK026" id="5" name="Svek"/></labels><extraartists><artist><id>26</id><name>Alexi Delano</name><anv></anv><join></join><role>Producer, Recorded By</role><tracks></tracks></artist><artist><id>27</id><name>Cari Lekebusch</name><anv></anv><join></join><role>Producer, Recorded By</role><tracks></tracks></artist><artist><id>26</id><name>Alexi Delano</name><anv>A. Delano</anv><join></join><role>Written-By</role><tracks></tracks></artist><artist><id>27</id><name>Cari Lekebusch</name><anv>C. Lekebusch</anv><join></join><role>Written-By</role><tracks></tracks></artist></extraartists><formats><format name="Vinyl" qty="1" text=""><descriptions><description>12"</description><description>33 ⅓ RPM</description></descriptions></format></formats><genres><genre>Electronic</genre></genres><styles><style>Broken Beat</style><style>Techno</style><style>Tech House</style></styles><country>Sweden</country><released>1998-06-00</released><notes>All joints recorded in NYC (Dec.97).</notes><data_quality>Correct</data_quality><master_id is_main_release="true">713738</master_id><tracklist><track><position>A1</position><title>A Sea Apart</title><duration>5:08</duration></track><track><position>A2</position><title>Dutchmaster</title><duration>4:21</duration></track><track><position>B1</position><title>Inner City Lullaby</title><duration>4:22</duration></track><track><position>B2</position><title>Yeah Kid!</title><duration>4:46</duration></track></tracklist><identifiers><identifier description="Side A Runout Etching" type="Matrix / Runout" value="MPO SK026-A -J.T.S.-"/><identifier description="Side B Runout Etching" type="Matrix / Runout" value="MPO SK026-B -J.T.S.-"/></identifiers><videos><video duration="268" embed="true" src="https://www.youtube.com/watch?v=LgLchSRehhc"><title>Mr. James Barth & A.D. - Dutchmaster</title><description>Mr. James Barth & A.D. - Dutchmaster</description></video><video duration="297" embed="true" src="https://www.youtube.com/watch?v=x_Os7b-iWKs"><title>Mr. James Barth & A.D. - Yeah Kid!</title><description>Mr. James Barth & A.D. - Yeah Kid!</description></video><video duration="314" embed="true" src="https://www.youtube.com/watch?v=MIgQNVhYILA"><title>Mr. James Barth & A.D. - A Sea Apart</title><description>Mr. James Barth & A.D. - A Sea Apart</description></video><video duration="267" embed="true" src="https://www.youtube.com/watch?v=iaqHaULlqqg"><title>Mr. James Barth & A.D. - Inner City Lullaby</title><description>Mr. James Barth & A.D. - Inner City Lullaby</description></video></videos><companies><company><id>266169</id><name>JTS Studios</name><catno></catno><entity_type>29</entity_type><entity_type_name>Mastered At</entity_type_name><resource_url>https://api.discogs.com/labels/266169</resource_url></company><company><id>56025</id><name>MPO</name><catno></catno><entity_type>17</entity_type><entity_type_name>Pressed By</entity_type_name><resource_url>https://api.discogs.com/labels/56025</resource_url></company></companies></release>
<release id="3" status="Accepted"><images><image height="595" type="primary" uri="" uri150="" width="600"/><image height="472" type="secondary" uri="" uri150="" width="600"/><image height="600" type="secondary" uri="" uri150="" width="599"/><image height="470" type="secondary" uri="" uri150="" width="600"/></images><artists><artist><id>3</id><name>Josh Wink</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists><title>Profound Sounds Vol. 1</title><labels><label catno="CK 63628" id="6" name="Ruffhouse Records"/></labels><extraartists><artist><id>3</id><name>Josh Wink</name><anv></anv><join></join><role>DJ Mix</role><tracks></tracks></artist></extraartists><formats><format name="CD" qty="1" text=""><descriptions><description>Compilation</description><description>Mixed</description></descriptions></format></formats><genres><genre>Electronic</genre></genres><styles><style>Techno</style><style>Tech House</style></styles><country>US</country><released>1999-07-13</released><notes>1: Track title is given as "D2" (which is the side of record on the vinyl version of i220-010 release). This was also released on CD where this track is listed on 8th position. On both version no titles are given (only writing/producing credits). Both versions of i220-010 can be seen on the master release page [m27265]. Additionally this track contains female vocals that aren't present on original i220-010 release.
4: Credited as J. Dahlbäck.
5: Track title wrongly given as "Vol. 1".
6: Credited as Gez Varley presents Tony Montana.
12: Track exclusive to Profound Sounds Vol. 1.</notes><data_quality>Correct</data_quality><master_id is_main_release="false">66526</master_id><tracklist><track><position>1</position><title>Untitled 8</title><duration>7:00</duration><artists><artist><id>5</id><name>Heiko Laux</name><anv></anv><join>&</join><role></role><tracks></tracks></artist><artist><id>4</id><name>Johannes Heil</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists></track><track><position>2</position><title>Anjua (Sneaky 3)</title><duration>5:28</duration><artists><artist><id>15525</id><name>Karl Axel Bissler</name><anv>K.A.B.</anv><join></join><role></role><tracks></tracks></artist></artists></track><track><position>3</position><title>When The Funk Hits The Fan (Mood II Swing When The Dub Hits The Fan)</title><duration>5:25</duration><artists><artist><id>7</id><name>Sylk 130</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists><extraartists><artist><id>8</id><name>Mood II Swing</name><anv></anv><join></join><role>Remix</role><tracks></tracks></artist></extraartists></track><track><position>4</position><title>What's The Time, Mr. Templar</title><duration>4:27</duration><artists><artist><id>1</id><name>The Persuader</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists></track><track><position>5</position><title>Vol. 2</title><duration>5:36</duration><artists><artist><id>267132</id><name>Care Company (2)</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists></track><track><position>6</position><title>Political Prisoner</title><duration>3:37</duration><artists><artist><id>6981</id><name>Gez Varley</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists></track><track><position>7</position><title>Pop Kulture</title><duration>5:03</duration><artists><artist><id>11</id><name>DJ Dozia</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists></track><track><position>8</position><title>K-Mart Shopping (Hi-Fi Mix)</title><duration>5:42</duration><artists><artist><id>10702</id><name>Nerio's Dubwork</name><anv></anv><join>Meets</join><role></role><tracks></tracks></artist><artist><id>233190</id><name>Kathy Lee</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists><extraartists><artist><id>23</id><name>Alex Hi-Fi</name><anv></anv><join></join><role>Remix</role><tracks></tracks></artist></extraartists></track><track><position>9</position><title>Lovelee Dae (Eight Miles High Mix)</title><duration>5:47</duration><artists><artist><id>13</id><name>Blaze</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists><extraartists><artist><id>14</id><name>Eight Miles High</name><anv></anv><join></join><role>Remix</role><tracks></tracks></artist></extraartists></track><track><position>10</position><title>Sweat</title><duration>6:06</duration><artists><artist><id>67226</id><name>Stacey Pullen</name><anv></anv><join>Presents</join><role></role><tracks></tracks></artist><artist><id>7554</id><name>Black Odyssey</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists><extraartists><artist><id>67226</id><name>Stacey Pullen</name><anv></anv><join></join><role>Presenter</role><tracks></tracks></artist></extraartists></track><track><position>11</position><title>Silver</title><duration>3:16</duration><artists><artist><id>3906</id><name>Christian Smith & John Selway</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists></track><track><position>12</position><title>Untitled</title><duration>2:46</duration><artists><artist><id>3</id><name>Josh Wink</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists></track><track><position>13</position><title>Boom Box</title><duration>3:41</duration><artists><artist><id>19</id><name>Sound Associates</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists></track><track><position>14</position><title>Track 2</title><duration>3:39</duration><artists><artist><id>20</id><name>Percy X</name><anv></anv><join></join><role></role><tracks></tracks></artist></artists></track></tracklist><identifiers><identifier type="Barcode" value="074646362822"/></identifiers>
&#13;
只是将其作为片段插入,是最简单的方法,抱歉。我现在要做的是寻找特殊的发布ID,检查他们是否有条形码,如果有条形码则将其取回。谁能请我指出正确的方向? 提前问候和感谢, rtuz2th
答案 0 :(得分:1)
SAX&#34;&#34; evented&#34; XML解析。 handler
具有调用的方法:
<child>
)</child>
)处理程序需要跟踪它当前在XML中的位置以及它感兴趣的值。因此它可以决定在遇到它感兴趣的元素时该怎么做。
你的示例XML有点大,所以我编写了我自己的小样本:
xml = <<EOS
<root>
<child id="1">
<barcode value="1111">
</child>
<child id="2">
</child>
<child id="1">
<barcode value="2222">
</child>
<child id="4">
<barcode value="3333">
</child>
</root>
EOS
我正在尝试查找具有child
ID和odd
条形码值的even
元素。
对于这个简单的例子,我跟踪堆栈上的所有标签和属性,在退出元素(@stack.pop
)时丢弃状态。这取决于XML文档的深度以及标签/属性的数量,这可能会花费很多,而且会很快。
require "ox"
require "stringio"
class Handler < ::Ox::Sax
def initialize
@stack = []
end
def start_element(element_name)
@stack << [element_name, {}]
end
def end_element(element_name)
parent_name, parent_attributes = @stack[-2]
if parent_name == :child && parent_attributes[:id].to_i.odd?
name, attributes = @stack[-1]
if name == :barcode && attributes[:value].to_i.even?
puts "Here is one record that seems interesting: Child: #{parent_attributes[:id]}, Barcode: #{attributes[:value]}"
end
end
@stack.pop
end
def attr(attribute_name, attribute_value)
_name, attributes = @stack.last
attributes[attribute_name] = attribute_value
end
end
handler = Handler.new
Ox.sax_parse(handler, StringIO.new(xml))
这将打印
这是一条看似有趣的记录:儿童:1,条形码:2222