Rsync在haskell中的滚动校验和

时间:2014-08-14 09:14:10

标签: haskell checksum

我开始学习haskell和远程delta压缩。我的第一步是在haskell中实现rsync的rolling checksum版本。那些公式中的块是否等于X(i)?如果是这样,我会感到困惑。

  • Haskell可以将bytestring转换为字节数组
  • 我如何将Word8的数组转换为那个大块。 Word32768?我的意思是如果X(i)列出了Word8 s?
  • ,该怎么办?
  • 之后如何对那些4KB大小的unsigned int进行算术运算?

此外,我当前的实现版本每个只滑动1B(Word8)。

2 个答案:

答案 0 :(得分:2)

  1. 使用ByteString很容易将[Word8]变成unpack,这应该足以执行此算法(尽管不一定最有效)

  2. 为什么需要将Word8转换为Word32768?为什么需要2^15位数?这可能很难表示,但您可以使用Word8的列表或数组,这很容易在内存中表示并且是等效的。

  3. 为了执行算术,mapzipWithfoldscan等函数非常有用。例如,执行算法的第一步:

  4. import qualified Data.ByteString as BS
    
    a :: Int -> Int -> ByteString -> Int
    a k l x
        = (`mod` m)
        $ sum
        $ map fromIntegral
        $ take (l - k)
        $ drop k
        $ BS.unpack x
        where m = 2 ^ 16
    

    实现函数b只是稍微困难一些,您只需要计算l - i + 1i = k的{​​{1}}序列,然后使用{{1在lzipWith (*)之间。在此之后,实施map fromIntegral非常简单,但如果您将take (l - k)的常见步骤分解出来,它肯定可以更有效地执行。

答案 1 :(得分:2)

在所提供的链接中的等式/公式中,块不等于X(i)。它主要与Data Deduplication有关。此外,rolling checksum可用于创建块,识别块边界等。

此外,我目前对rsync的滚动校验和的实现如下。接下来我将实现循环多项式滚动校验和,然后阅读Data Deduplication

上的一些书籍
import qualified Data.ByteString.Lazy as B
import qualified Data.ByteString.Lazy.Char8 as B8
import Data.Word
import Data.Bits
import Data.Int

type CheckSumPartial = Word16
type CheckSumA = CheckSumPartial
type CheckSumB = CheckSumPartial
type WindowSize = Int64
type CheckSum = Word32
type Byte = Word8

main:: IO ()
main = do
  let str = B8.pack "abcdef"
  let s1 = roll 3 str
  let s2 = withoutRoll 3 str
  print s1
  print s2
  return ()

roll :: WindowSize -> B.ByteString -> [CheckSum]
roll w str = 
  let
    (a,b,s) = newABS w str
    h = B.head str
    t = B.tail str
  in if fromIntegral (B.length t) < w
        then [s]
        else s : rollNext w t h a b

withoutRoll :: WindowSize -> B.ByteString -> [CheckSum]
withoutRoll w str =
  let
    (_,_,s) = newABS w str
    t = B.tail str
  in if fromIntegral (B.length t) <  w
      then [s]
      else s : withoutRoll w t

newA :: WindowSize -> B.ByteString -> CheckSumA
newA w str = 
  let    block = B.take w str
  in B.foldr aSum (0::CheckSumA) block
  where
    aSum x acc = acc + (fromIntegral x :: CheckSumA)

newB :: WindowSize -> B.ByteString -> CheckSumB
newB w str = 
  let block = B.take w str
  in fst $ B.foldr bSum (0::CheckSumB, w) block
  where
    bSum x (acc,l) = (acc +  fromIntegral l * (fromIntegral x :: CheckSumB), l-1) 

rollA :: CheckSumA -> Byte -> Byte -> CheckSumA
rollA prevA prevHead curLast = prevA - fromIntegral prevHead + fromIntegral curLast

rollB :: CheckSumA -> Byte -> WindowSize -> CheckSumB -> CheckSumB
rollB curA prevHead w prevB = prevB - fromIntegral w * fromIntegral prevHead + curA

calculateS :: CheckSumA -> CheckSumB -> CheckSum
calculateS a b = (fromIntegral a :: Word32) .|. shift (fromIntegral b :: Word32) 16

rollNext :: WindowSize ->B.ByteString -> Byte -> CheckSumA -> CheckSumB -> [CheckSum]
rollNext w str prevHead prevA prevB =
  let
    curBlock = B.take (fromIntegral w) str
    curLast = B.last curBlock
    h = B.head str
    t = B.tail str
    a = rollA prevA prevHead curLast
    b = rollB a prevHead w prevB
    s = calculateS a b
  in if fromIntegral (B.length t) < w
      then [s]
      else s : rollNext w t h a b

newABS :: WindowSize -> B.ByteString -> (CheckSumA, CheckSumB, CheckSum)
newABS w str =
  let a = newA w str
      b = newB w str
      s = calculateS a b
   in (a,b,s)