鉴于深度嵌入简单的数据处理DSL [1]:
{-# LANGUAGE GADTs, StandaloneDeriving #-}
import Data.List
import Text.Show.Functions
data Dist e where
Concat :: [Dist [a]] -> Dist [a]
-- We use ConcatMap as a primitive because it can express e.g.
-- both map and filter.
ConcatMap :: (a -> [b]) -> Dist [a] -> Dist [b]
-- Expensive to traverse input (think distributed file).
Input :: Dist [a]
Let :: Name -> Dist e -> Dist e -> Dist e
-- We're not dealing with name collisions here for simplicity.
Var :: Name -> Dist e
deriving instance Show (Dist e)
type Name = String
我们可以像这样实施熟悉的生产者 - 消费者融合
-- ---------------------------------------------------------------------
-- Producer-consumer fusion
-- Fuses adjacent ConcatMaps.
fuseProducerConsumer :: Dist e -> Dist e
fuseProducerConsumer = go
where
go :: Dist e -> Dist e
go (ConcatMap f (ConcatMap g e)) = ConcatMap (concatMap f . g) (go e)
go e = e
一个显示其工作原理的小例子:
-- Should be able to fuse this to a single ConcatMap.
producerConsumerFusable :: Dist [Int]
producerConsumerFusable = ConcatMap (singleton . (+ 1))
(ConcatMap (singleton . (* 2)) Input)
singleton :: a -> [a]
singleton = (: [])
-- Expected result after optimization.
expectedProducerConsumerResult =
ConcatMap (concatMap (singleton . (+ 1)) . (singleton . (* 2))) Input
还有一种不那么广为人知的[2]融合类型称为兄弟融合,它消除了同一输入的多次遍历。想法是替换像
这样的东西(map f xs, map g xs)
与
let ys = map (\ x -> (f x, g x)) xs
in (map fst ys, map snd ys)
如果遍历ys
比遍历xs
便宜得多(例如,如果xs
是网络上的文件),或者我们可以例如使用生产者 - 消费者融合来稍后融合其他一些遍历,这是一个胜利。
虽然使用上面的标准AST可以轻松实现生产者 - 消费者融合,但我不知道如何使用这种表示来实现兄弟融合。
-- ---------------------------------------------------------------------
-- Sibling fusion
-- Fuses ConcatMaps that consumer the same input.
fuseSibling :: Dist e -> Dist e
fuseSibling = id -- ???
我们想要发生的事情的一个例子:
-- The use of Concat below is not important, we just need some Dist e
-- that contains an opportunity for sibling fusion.
siblingFusable :: Dist [Int]
siblingFusable = Let "xs" Input $ -- shares one input
Concat [ConcatMap (singleton . (+ 1)) (Var "xs"),
ConcatMap (singleton . (* 2)) (Var "xs")]
-- Expected result after optimization.
expectedSiblingResult =
Let "xs" Input $
(Let "ys" (ConcatMap
(mapTwo (singleton . (+ 1)) (singleton . (* 2)))
(Var "xs")) -- only one traversal of "xs" and thus Input
(Concat [ConcatMap lefts (Var "ys"),
ConcatMap rights (Var "ys")]))
-- Some helper functions:
lefts :: Either a b -> [a]
lefts (Left x) = [x]
lefts _ = []
rights :: Either a b -> [b]
rights (Right x) = [x]
rights _ = []
mapTwo :: (a -> [b]) -> (a -> [c]) -> a -> [Either b c]
mapTwo f g x = map Left (f x) ++ map Right (g x)
问题在于,虽然我们可以通过ConcatMap ... (ConcatMap ... ...)
上的模式匹配轻松发现消费者 - 生产者融合机会,但是单一输入的两个消费者并不一定会产生兄弟般的融合机会&# 34;靠近"在AST中以相同的方式彼此相对。
如果我们能够以相反的方向遍历AST,即从Input
开始,那么一个输入的并行消费者将更容易被发现。鉴于每个操作仅涉及其输入而不是其输出,我无法看到如何执行此操作。
问题:可以使用此AST表示实现兄弟融合,还是可以使用其他一些(例如图形或基于延续的)表示来实现兄弟融合?最好仍然使用GADT进行类型安全。
答案 0 :(得分:2)
我创造了一个我将在世界上释放出来的怪物。这是您在伊德里斯的转型实施。
我首先在Haskell中开始研究这个问题,问题是我们基本上正在寻找一种方法来为每个变量收集一组函数f1 :: a -> b1, f2 :: a -> b2, ...
。在Haskell中提出一个很好的代表是很棘手的,因为一方面,我们想隐藏存在主义背后的b1, b2, ...
类型,但另一方面,当我们看到ConcatMap
我们需要时构造一个函数,从恰当类型的[Either b1 (Either b2 (...))]
中提取正确的坐标。
所以,首先,让我们确保我们的变量引用具有良好的范围和良好类型,通过使用范围中的变量索引Dist
并对变量出现使用De Bruijn索引:
%default total
Ctx : Type
Ctx = List Type
data VarPtr : Ctx -> Type -> Type where
here : VarPtr (a :: ctx) a
there : VarPtr ctx b -> VarPtr (a :: ctx) b
data Dist : Ctx -> Type -> Type where
Input : Dist ctx a
Concat2 : Dist ctx a -> Dist ctx a -> Dist ctx a
ConcatMap : (a -> List b) -> Dist ctx a -> Dist ctx b
Let : Dist ctx a -> Dist (a :: ctx) b -> Dist ctx b
Var : VarPtr ctx a -> Dist ctx a
可以看出,我对Dist
进行了两次简化:
无论如何,一切都是列表般的东西,所以ConcatMap
的类型为Dist ctx a -> Dist ctx b
,而不是Dist ctx (List a) -> Dist ctx (List b)
。只使用原始问题中提供的组合器,无论如何,Dist
唯一能够构建的值都是列表。这使得实现更简单(换句话说,在我进行此更改之前,我遇到了各种不必要的并发症)。
Concat2
是二进制而不是 n -ary。将fuseHoriz
更改为支持 n -ary concatenation是一个留给读者的练习。
让我们先实现垂直融合,只是为了让我们的脚湿透:
fuseVert : Dist ctx a -> Dist ctx a
fuseVert Input = Input
fuseVert (Concat2 xs ys) = Concat2 (fuseVert xs) (fuseVert ys)
fuseVert (ConcatMap f d) = case fuseVert d of
ConcatMap g d' => ConcatMap (concatMap f . g) d'
d' => ConcatMap f d'
fuseVert (Let d0 d) = Let (fuseVert d0) (fuseVert d)
fuseVert (Var k) = Var k
到目前为止一切顺利:
namespace Examples
f : Int -> List Int
f = return . (+1)
g : Int -> List Int
g = return . (* 2)
ex1 : Dist [] Int
ex1 = ConcatMap f $ ConcatMap g $ Input
ex1' : Dist [] Int
ex1' = ConcatMap (concatMap f . g) $ Input
prf : fuseVert ex1 = ex1'
prf = Refl
现在为有趣的部分。我们需要很好地表达来自同一领域的#34;功能集合。以及指向该集合中特定功能(具有特定codomain)的方法。我们将从ConcatMap f (Var v)
调用中收集这些函数,并由v
键入;然后用一个在我们收集完所有内容后填充的洞来替换调用本身。
当我们遇到Concat2 d1 d2
时,我们需要合并从双方收集的函数,然后削弱d1
和d2
中的漏洞以覆盖此扩展集合。
我正在使用二叉树而不是平面列表:因此弱化很容易实现。
由于我重复使用here
/ there
术语,因此它位于自己的命名空间中:
namespace Funs
data Funs : Type -> Type where
None : Funs a
Leaf : (a -> List b) -> Funs a
Branch : Funs a -> Funs a -> Funs a
instance Semigroup (Funs a) where
(<+>) = Branch
data FunPtr : Funs a -> Type -> Type where
here : FunPtr (Leaf {b} _) b
left : FunPtr fs b -> FunPtr (Branch fs _) b
right : FunPtr fs b -> FunPtr (Branch _ fs) b
现在我们已经对给定变量上应用的所有函数的集合进行了表示,我们最终可以在实现水平融合方面取得一些进展。
重申一下,目标是转变为
let xs = Input :: [A]
in Concat2 (E $ ConcatMap f xs) (F $ ConcatMap g xs)
where
f :: A -> [B]
g :: A -> [C]
类似
let xs = Input :: [A]
xs' = ConcatMap (\x -> map Left (f x) ++ map Right (g x)) xs :: [(Either B C)]
in Concat2 (E $ ConcatMap (either return (const []) xs') (F $ ConcatMap (either (const []) return) xs')
首先,我们需要能够从xs'
上应用的函数集合中对memoizer(xs
的定义)进行代码生成:
memoType : Funs a -> Type
memoType None = ()
memoType (Leaf {b} _) = b
memoType (Branch fs1 fs2) = Either (memoType fs1) (memoType fs2)
memoFun : (fs : Funs a) -> (a -> List (memoType fs))
memoFun None = const []
memoFun (Leaf f) = f
memoFun (Branch fs1 fs2) = (\xs => map Left (memoFun fs1 xs) <+> map Right (memoFun fs2 xs))
memoExpr : (fs : Funs a) -> Dist (a :: ctx) (memoType fs)
memoExpr fs = ConcatMap (memoFun fs) (Var here)
如果我们以后无法查找这些记忆结果,那么它将不会有多大用处:
lookupMemo : {fs : Funs a} -> (i : FunPtr fs b) -> (memoType fs -> List b)
lookupMemo {fs = Leaf f} here = \x => [x]
lookupMemo {fs = (Branch fs1 fs2)} (left i) = either (lookupMemo i) (const [])
lookupMemo {fs = (Branch fs1 fs2)} (right i) = either (const []) (lookupMemo i)
现在,当我们遍历源代码树时,我们当然会同时收集几个变量的用法(通过ConcatMap
),因为它完全可以像
let xs = ...
in Concat2 (ConcatMap f xs) (let ys = ... in ... (ConcatMap g xs) ...)
这将与变量上下文一起填充,因为在每个Let
绑定中,我们还可以生成新变量的所有用法的memoizer。
namespace Usages
data Usages : Ctx -> Type where
Nil : Usages []
(::) : {a : Type} -> Funs a -> Usages ctx -> Usages (a :: ctx)
unused : {ctx : Ctx} -> Usages ctx
unused {ctx = []} = []
unused {ctx = _ :: ctx} = None :: unused {ctx}
instance Semigroup (Usages ctx) where
[] <+> [] = []
(fs1 :: us1) <+> (fs2 :: us2) = (fs1 <+> fs2) :: (us1 <+> us2)
我们将为这些合成变量保留空间:
ctxDup : {ctx : Ctx} -> Usages ctx -> Ctx
ctxDup {ctx = []} us = []
ctxDup {ctx = t :: ts} (fs :: us) = (memoType fs) :: t :: ctxDup us
varDup : {us : Usages ctx} -> VarPtr ctx a -> VarPtr (ctxDup us) a
varDup {us = _ :: _} here = there here
varDup {us = _ :: _} (there v) = there $ there $ varDup v
现在我们终于准备好定义优化器的内部中间表示:&#34; Dist
带孔&#34;。每个孔代表一个函数在变量上的应用,当我们知道所有的用法时,它将被填充,并且我们在范围内有它们的所有合成变量:
namespace HDist
data Hole : Usages ctx -> Type -> Type where
here : FunPtr u b -> Hole (u :: us) b
there : Hole us b -> Hole (_ :: us) b
resolve : {us : Usages ctx} -> Hole us b -> Exists (\a => (VarPtr (ctxDup us) a, a -> List b))
resolve (here i) = Evidence _ (here, lookupMemo i)
resolve (there h) with (resolve h) | Evidence a (v, f) = Evidence a (there $ there v, f)
data HDist : Usages ctx -> Type -> Type where
HInput : HDist us a
HConcat : HDist us a -> HDist us a -> HDist us a
HConcatMap : (b -> List a) -> HDist us b -> HDist us a
HLet : HDist us a -> (fs : Funs a) -> HDist (fs :: us) b -> HDist us b
HVar : {ctx : Ctx} -> {us : Usages ctx} -> VarPtr ctx a -> HDist us a
HHole : (hole : Hole us a) -> HDist us a
所以一旦我们有了这么多洞穴Dist
,填充它只是走路和解决洞的问题:
fill : HDist us a -> Dist (ctxDup us) a
fill HInput = Input
fill (HConcat e1 e2) = Concat2 (fill e1) (fill e2)
fill (HConcatMap f e) = ConcatMap f $ fill e
fill (HLet e0 fs e) = Let (fill e0) $ Let (memoExpr fs) $ fill e
fill (HVar x) = Var (varDup x)
fill (HHole h) with (resolve h) | Evidence a (v, f) = ConcatMap f $ Var v
然后,水平融合只是肘部油脂的问题:将Dist ctx a
变成HDist us a
,使每个ConcatMap f (Var v)
变成HHole
。当我从Usages
的两边组合两个Concat2
时,我们需要做一些额外的有趣的舞蹈来改变漏洞。
weakenHoleL : Hole us1 a -> Hole (us1 <+> us2) a
weakenHoleL {us1 = _ :: _} {us2 = _ :: _} (here i) = here (left i)
weakenHoleL {us1 = _ :: _} {us2 = _ :: _} (there h) = there $ weakenHoleL h
weakenHoleR : Hole us2 a -> Hole (us1 <+> us2) a
weakenHoleR {us1 = _ :: _} {us2 = _ :: _} (here i) = here (right i)
weakenHoleR {us1 = _ :: _} {us2 = _ :: _} (there h) = there $ weakenHoleR h
weakenL : HDist us1 a -> HDist (us1 <+> us2) a
weakenL HInput = HInput
weakenL (HConcat e1 e2) = HConcat (weakenL e1) (weakenL e2)
weakenL (HConcatMap f e) = HConcatMap f (weakenL e)
weakenL {us1 = us1} {us2 = us2} (HLet e fs x) = HLet (weakenL e) (Branch fs None) (weakenL {us2 = None :: us2} x)
weakenL (HVar x) = HVar x
weakenL (HHole hole) = HHole (weakenHoleL hole)
weakenR : HDist us2 a -> HDist (us1 <+> us2) a
weakenR HInput = HInput
weakenR (HConcat e1 e2) = HConcat (weakenR e1) (weakenR e2)
weakenR (HConcatMap f e) = HConcatMap f (weakenR e)
weakenR {us1 = us1} {us2 = us2} (HLet e fs x) = HLet (weakenR e) (Branch None fs) (weakenR {us1 = None :: us1} x)
weakenR (HVar x) = HVar x
weakenR (HHole hole) = HHole (weakenHoleR hole)
fuseHoriz : Dist ctx a -> Exists {a = Usages ctx} (\us => HDist us a)
fuseHoriz Input = Evidence unused HInput
fuseHoriz (Concat2 d1 d2) with (fuseHoriz d1)
| Evidence us1 e1 with (fuseHoriz d2)
| Evidence us2 e2 =
Evidence (us1 <+> us2) $ HConcat (weakenL e1) (weakenR e2)
fuseHoriz {ctx = _ :: ctx} (ConcatMap f (Var here)) =
Evidence (Leaf f :: unused) (HHole (here here))
fuseHoriz (ConcatMap f d) with (fuseHoriz d)
| Evidence us e = Evidence us (HConcatMap f e)
fuseHoriz (Let d0 d) with (fuseHoriz d0)
| Evidence us0 e0 with (fuseHoriz d)
| Evidence (fs :: us) e =
Evidence (us0 <+> us) $ HLet (weakenL e0) (Branch None fs) $ weakenR {us1 = None :: us0} e
fuseHoriz (Var v) = Evidence unused (HVar v)
我们可以通过将其与fuseVert
相结合并将其提供给fill
来使用此怪物:
fuse : Dist [] a -> Dist [] a
fuse d = fill $ getProof $ fuseHoriz . fuseVert $ d
并且presto:
namespace Examples
ex2 : Dist [] Int
ex2 = Let Input $
Concat2 (ConcatMap f (Var here))
(ConcatMap g (Var here))
ex2' : Dist [] Int
ex2' = Let Input $
Let (ConcatMap (\x => map Left [] ++ map Right (map Left (f x) ++ map Right (g x))) (Var here)) $
Concat2 (ConcatMap f' (Var here)) (ConcatMap g' (Var here))
where
f' : Either () (Either Int Int) -> List Int
f' = either (const []) $ either return $ const []
g' : Either () (Either Int Int) -> List Int
g' = either (const []) $ either (const []) $ return
prf2 : fuse ex2 = ex2'
prf2 = Refl
我希望我可以将融合 fuseVert
加入fuseHoriz
,因为我认为它应该需要的是一个额外的案例:
fuseHoriz (ConcatMap f (ConcatMap g d)) = fuseHoriz (ConcatMap (concatMap f . g) d)
但是,这会使Idris终止检查程序混淆,除非我在assert_smaller
vs ConcatMap (concatMap f . g) d
上添加了ConcatMap f (ConcatMap g d))
,我不明白为什么,因为我还有一层{ {1}}构造函数比另一个。