我正在尝试为XML.Light数据类型编写一个高效的XML呈现,我试图用Data.Text.Lazy.Builder
来做这个,因为这似乎是一个明显的选择。但是,我很难从我的解决方案中获得任何性能:
{-# LANGUAGE OverloadedStrings #-}
import Data.Text (Text, unpack)
import Text.XML.Light
import qualified Data.Text.Lazy as LT
import qualified Data.Text.Lazy.Builder as LB
import Data.Foldable (foldMap)
import Data.Monoid (mconcat)
data Tag = Tag !Text
data Artist = Artist { artistName :: !Text , artistTags :: ![Tag] }
class ToXML a where toXML :: a -> Content
instance ToXML Artist where
toXML a = Elem $
Element (unqual "artist") []
[ text (artistName a)
, Elem $ Element (unqual "tag-list") []
(map toXML (artistTags a))
Nothing
]
Nothing
instance ToXML Tag where
toXML (Tag t) = Elem $ Element (unqual "tag") [] [ text t ] Nothing
text :: Text -> Content
text t = Text $ CData CDataText (unpack t) Nothing
render :: Content -> LB.Builder
render (Elem e) = renderElement e
render (Text s) = LB.fromString (cdData s)
renderElement :: Element -> LB.Builder
renderElement element = mconcat
[ LB.singleton '<'
, LB.fromString . qName . elName $ element
, LB.singleton '>'
, foldMap render (elContent element)
, LB.fromText "</"
, LB.fromString . qName .elName $ element
, LB.singleton '>'
]
main :: IO ()
main = let artist = Artist "Nirvana" (replicate 5000000 (Tag "Hi"))
xml = Element (unqual "metadata") [] [ toXML artist ] Nothing
in print (LT.length . LB.toLazyText . renderElement $ xml)
根据+RTS -s
:
7,368,153,472 bytes allocated in the heap
2,625,983,944 bytes copied during GC
708,149,024 bytes maximum residency (13 sample(s))
21,954,496 bytes maximum slop
1443 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 14270 colls, 0 par 1.65s 1.69s 0.0001s 0.0009s
Gen 1 13 colls, 0 par 2.57s 2.80s 0.2157s 1.2388s
TASKS: 3 (1 bound, 2 peak workers (2 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.00s ( 0.00s elapsed)
MUT time 1.81s ( 1.84s elapsed)
GC time 4.22s ( 4.50s elapsed)
EXIT time 0.07s ( 0.09s elapsed)
Total time 6.11s ( 6.43s elapsed)
Alloc rate 4,064,658,288 bytes per MUT second
Productivity 30.8% of total user, 29.3% of total elapsed
哪个太糟糕了。不仅是底层生产力,超过7GiB在堆中分配以呈现64MB的XML。这看起来非常低效!但是,我不知道所有这些垃圾究竟来自哪里。我使用+RTS -p
生成了一个堆配置文件,并使用hp2ps
:
我还用+RTS -l
运行它并用ThreadScope渲染它:
令人遗憾的是,我现在不知道该怎么做,将这些部分放在一起以提高效率,降低内存使用率。我确实想知道XML.Light
中的类型是否不是最优的(没有严格,String
超过Text
)但是仍然 - 这个慢?
我还观察到其他一些我觉得有点奇怪的东西。如果我将main
更改为:
main :: IO ()
main = let artist = Artist "Nirvana" (replicate 5000000 (Tag "Hi"))
xml = Element (unqual "metadata") [] [ toXML artist ] Nothing
in print (LT.length $ LB.toLazyText $ mconcat $ map (render.toXML) $ artistTags artist)
生产力下降高达94%,所以可能与toXML
中的递归有关,这是有问题的并且过于懒惰。
答案 0 :(得分:2)
我解决了这个问题,我认为这是GHC中的一个错误。
如果我们更改此行:
, LB.fromString . qName . elName $ element
进入这个:
, LB.fromString $ qName . elName $ element
然后我们得到了我们期望的表现。似乎用LB.fromString
组合qName
会阻止一些内联,因此融合不会发生。我认为这是非常危险的,所以我将把这个问题转移到关于GHCs bug追踪器的错误报告中,看看那里的聪明人是怎么想的。
谈论一个问题!