我正在解析非常大的xml文件> 40 MB。我刚开始使用scala进行开发,所以我浏览了一些好的库,并偶然发现了Scala Scales,它似乎非常擅长处理大型文件。
我看过: http://scala-scales.googlecode.com/svn/sites/scales/scales-xml_2.9.1/0.2/ScalesXmlIntro.html , http://scala-scales.googlecode.com/svn/sites/scales/scales-xml_2.9.2/0.4.4/PullParsing.html
然后测试了pullXml函数,以确保正确导入所有库。
val pull = pullXml(new FileReader("/Users/mycrazyxml/tmp/large.xml"))
while( pull.hasNext ){
pull.next match {
case Left( i : XmlItem ) =>
// Handle XmlItem
Logger.info("XmlItem: "+i)
case Left( e : Elem ) => {
// Handle Element
Logger.info("Element: "+e)
}
case Right(endElem) =>
// Handle endElement
Logger.info("Endelement: "+endElem)
}
}
这导致整个文件被打印到控制台!太好了! 现在是时候创建对象并保存到数据库,但我有 抓住如何以一种好的方式做到这一点的麻烦。我真的需要一些很好的例子 怎么做
EG。以下XML具有多个Enterprise元素,这些元素可以包含一个或多个LocalUnit。 这里的想法是使用LocalUnits数组创建一个Enterprise对象。什么时候 endElement是Enterprise的结束标记,使用带有LocalUnits的Enterprise对象调用save方法。
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE Info SYSTEM "info.dtd">
<Info>
<Enterprise>
<RegNo>12345678</RegNo>
<Address>
<StreetInfo>
<StreetName>Infinite Loop</StreetName>
<StreetNumber>1</StreetNumber>
</StreetInfo>
</Address>
<EName>
<Legal>Crazy Company</Legal>
</EName>
<SNI>
<Code>00000</Code>
<Rank>1</Rank>
</SNI>
<LocalUnit>
<CFARNo>987654321</CFARNo>
<LUType>1</LUType>
<LUName>Crazy Company Gym</LUName>
<LUStatus>1</LUStatus>
<SNI>
<Code>46772</Code>
<Rank>1</Rank>
</SNI>
<SNI>
<Code>68203</Code>
<Rank>2</Rank>
</SNI>
<Address>
<StreetInfo>
<StreetName>Infinite Loop</StreetName>
<StreetNumber>1</StreetNumber>
</StreetInfo>
</Address>
</LocalUnit>
<LocalUnit>
<CFARNo>987654322</CFARNo>
<LUType>1</LUType>
<LUName>Crazy Company Restaurant</LUName>
<LUStatus>1</LUStatus>
<SNI>
<Code>46772</Code>
<Rank>1</Rank>
</SNI>
<SNI>
<Code>68203</Code>
<Rank>2</Rank>
</SNI>
<Address>
<StreetInfo>
<StreetName>Infinite Loop</StreetName>
<StreetNumber>1</StreetNumber>
</StreetInfo>
</Address>
</LocalUnit>
</Enterprise>
<Enterprise>
<RegNo>12345671220</RegNo>
<Address>
<StreetInfo>
<StreetName>Cupertino Road</StreetName>
<StreetNumber>2</StreetNumber>
</StreetInfo>
</Address>
<EName>
<Legal>Fun Company HQ</Legal>
</EName>
<SNI>
<Code>00000</Code>
<Rank>1</Rank>
</SNI>
<LocalUnit>
<CFARNo>987654321</CFARNo>
<LUType>1</LUType>
<LUName>Fun Company</LUName>
<LUStatus>1</LUStatus>
<SNI>
<Code>46772</Code>
<Rank>1</Rank>
</SNI>
<SNI>
<Code>68203</Code>
<Rank>2</Rank>
</SNI>
<Address>
<StreetInfo>
<StreetName>Cupertino road</StreetName>
<StreetNumber>2</StreetNumber>
</StreetInfo>
</Address>
</LocalUnit>
</Enterprise>
</Info>
总结一下。对于给定的xml,我应该如何使用pullXml创建我的对象并使用它们调用save方法?
答案 0 :(得分:2)
val xmlFile = resource(this, "/data/enterprise_info.xml")
val xml = pullXml(xmlFile)
val Info = NoNamespaceQName("Info")
val Enterprise = NoNamespaceQName("Enterprise")
val LocalUnit = NoNamespaceQName("LocalUnit")
val LocalUnitName = NoNamespaceQName("LUName")
val EName = NoNamespaceQName("EName")
val Legal = NoNamespaceQName("Legal")
val EnterprisePath = List(Info, Enterprise)
// iterate over each Enterprise
// only an Enterprise at a time is in memory
val itr = iterate(EnterprisePath, xml)
for {
enterprise <- itr
enterpriseName <- enterprise \* EName \* Legal
} {
println("enterprise "+text(enterpriseName) +" has units:")
for {
localUnits <- enterprise \* LocalUnit
localName <- localUnits \* LocalUnitName
}{
println(" " + text(localName))
}
//do a save
}
暂时拉入每个LocalUnit更加困难,你必须为每个不属于LocalUnit的子部分分别路径。
H个
答案 1 :(得分:-1)
也许这可以帮到你?我认为使用xml的小片段在Scala中运行得非常好。 http://joncook.github.io/blog/2013/11/03/xml-processing-with-scala/