Question

我经常有机会在Haskell中执行模运算，其中模数通常较大且通常为素数（如2000000011）。目前，我只使用像（modAdd mab），（modMul mab），（modDiv mab）等函数。但这样做相当不方便，需要一个额外的参数来始终指定和携带并在常规积分中创建我的各种函数形式和单独的模式。

因此，创建一个类似这样的新类可能是个好主意：

class Integral a => Mod a

m = 2000000011

instance Integral a => Num (Mod a) where
   ... defining (+), etc. as modulo m

然后可以使用常规函数执行常规算术，并定义有用的结构，如

factorials :: [Mod Int]
factorials = 1:zipWith (*) factorials [1..]

但这有一个问题：Mod Int类型的所有值必须具有相同的模数。但是，我经常需要在一个程序中使用多个模数（当然总是只组合相同模数的值）。

我认为，但不能完全理解，这可以通过以下方式克服：

class Integral a => Mod Nat a

其中Nat是一种以Peano方式编码模数的类型。这将是有利的：我可以拥有不同模数的值，类型检查器可以避免我意外地合并这个值。

这样的事情是否可行且有效？它是否会导致编译器或RTS尝试实际构建巨大的（Succ（Succ（重复...重复2000000011次）如果我尝试使用该模数，使解决方案无效？RTS是否会尝试检查在每个操作上都匹配类型吗？每个值的RTS表示是否会从一个只有一个未装箱的int中被炸毁？

有更好的方法吗？

结论

感谢来自cirdec，dfeuer，user5402和tikhon-jelvis的有用评论，我了解到（不出所料）我不是第一个有这个想法的人。特别是，Kiselyov和Shan最近有paper提供了一个实现，并且tikhon-jelvis向Hackage发布了一个名为（surprise！）modular-arithmetic的解决方案，它使用花哨的ghc pragma提供更好的语义

开放式问题（对我来说）

幕后会发生什么？特别是，[Mod Int 2000000011]的百万元素清单是否会带来额外的200万份20000000左右？或者它是否编译为与一百万个Int的列表相同的代码，其中模数参数单独携带？后者会很好。

性能附加

我对我正在处理的当前问题进行了一些基准测试。第一次运行使用了未装箱的10,000个元素的Int向量，并对其执行了10,000次操作：

   4,810,589,520 bytes allocated in the heap
         107,496 bytes copied during GC
       1,197,320 bytes maximum residency (1454 sample(s))
         734,960 bytes maximum slop
              10 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      6905 colls,     0 par    0.109s   0.101s     0.0000s    0.0006s
  Gen  1      1454 colls,     0 par    0.812s   0.914s     0.0006s    0.0019s

  TASKS: 13 (1 bound, 12 peak workers (12 total), using -N11)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.001s elapsed)
  MUT     time    2.672s  (  2.597s elapsed)
  GC      time    0.922s  (  1.015s elapsed)
  EXIT    time    0.000s  (  0.001s elapsed)
  Total   time    3.594s  (  3.614s elapsed)

  Alloc rate    1,800,454,557 bytes per MUT second

  Productivity  74.3% of total user, 73.9% of total elapsed

对于第二次运行，我对未装箱的Vector 10,000（Mod Int 1000000007）执行了相同的操作。这使得我的代码变得更简单，但是花了大约3倍的时间（虽然具有几乎相同的内存配置文件）：

   4,810,911,824 bytes allocated in the heap
         107,128 bytes copied during GC
       1,199,408 bytes maximum residency (1453 sample(s))
         736,928 bytes maximum slop
              10 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      6906 colls,     0 par    0.094s   0.107s     0.0000s    0.0007s
  Gen  1      1453 colls,     0 par    1.516s   1.750s     0.0012s    0.0035s

  TASKS: 13 (1 bound, 12 peak workers (12 total), using -N11)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.000s  (  0.001s elapsed)
  MUT     time    8.562s  (  8.323s elapsed)
  GC      time    1.609s  (  1.857s elapsed)
  EXIT    time    0.000s  (  0.001s elapsed)
  Total   time   10.172s  ( 10.183s elapsed)

  Alloc rate    561,858,315 bytes per MUT second

  Productivity  84.2% of total user, 84.1% of total elapsed

我想知道为什么会发生这种情况，如果可以修复的话。不过，我真的很喜欢模块化算术包，并且会在性能不是绝对关键的情况下使用它。

Answer 1

以下是一些使用Data.Reflection的工作代码：

{-# LANGUAGE Rank2Types #-}
{-# LANGUAGE FlexibleContexts #-}

import Data.Reflection
import Data.Proxy

data M a s = M a -- Note the phantom comes *after* the concrete

-- In `normalize` we're tying the knot to get the phantom types to align
-- note that reflect :: Reifies s a => forall proxy. proxy s -> a

normalize :: (Reifies s a, Integral a) => a -> M a s
normalize a = b where b = M (mod a (reflect b)) 

instance (Reifies s a, Integral a) => Num (M a s) where
  M a + M b = normalize (a + b)
  M a - M b = normalize (a - b)
  M a * M b = normalize (a * b)
  fromInteger n = normalize (fromInteger n)
  abs _     = error "abs not implemented"
  signum _  = error "sgn not implemented"

withModulus :: Integral a => a -> (forall s. Reifies s a => M a s) -> a
withModulus m ma = reify m (runM . asProxyOf ma)
  where asProxyOf :: f s -> Proxy s -> f s
        asProxyOf a _ = a

runM :: M a s -> a
runM (M a) = a

example :: (Reifies s a, Integral a) => M a s
example = normalize 3

example2 :: (Reifies s a, Integral a, Num (M a s)) => M a s
example2 = 3*3 + 5*5

mfactorial :: (Reifies s a, Integral a, Num (M a s)) => Int -> M a s
mfactorial n = product $ map fromIntegral [1..n]

test1 p n = withModulus p $ mfactorial n

madd :: (Reifies s Int, Num (M Int s)) => M Int s -> M Int s -> M Int s
madd a b = a + b

test2 :: Int -> Int -> Int -> Int
test2 p a b = withModulus p $ madd (fromIntegral a) (fromIntegral b)

Answer 2

较新版本的GHC内置了类型级别的数字，这应该比使用Peano算法自行滚动的数据更有效。您可以通过启用DataKinds来使用它们。作为奖励，您还可以获得一些不错的语法：

factorials :: [Mod Int 20]

这是否有效取决于您实施Mod类型的方式。在最一般的情况下，您可能希望在每次算术运算后只mod。除非你处于热门循环中，保存一些指令很重要，否则这应该没问题。（在热循环中，最好明确一下你何时修改。）

我实际上是在Hackage的库中实现了这种类型：modular-arithmetic。它有一个测试套件，但没有基准，所以我不能保证绝对性能，但它没有做任何应该缓慢的事情，它足够快我的目的。（诚然，这涉及小模数。）如果你尝试并遇到性能问题，我很乐意听到它们，所以我可以尝试修复它们。

使用Haskell类型系列或GADT的模块化算法？

结论

开放式问题（对我来说）

性能附加

2 个答案: