比较F#中的文件内容

时间:2010-12-20 20:28:06

标签: .net f# filestream

我写了一个快速而又脏的函数来比较文件内容(BTW,我已经测试过它们的大小相同):

let eqFiles f1 f2 =
  let bytes1 = Seq.ofArray (File.ReadAllBytes f1)
  let bytes2 = Seq.ofArray (File.ReadAllBytes f2)
  let res = Seq.compareWith (fun x y -> (int x) - (int y)) bytes1 bytes2
  res = 0

我不满意将所有内容读入数组。我宁愿有一个懒惰的字节序列,但我在F#中找不到合适的API。

6 个答案:

答案 0 :(得分:9)

如果您想使用F#的全部功能,那么您也可以异步执行。我们的想法是,您可以从两个文件中异步读取指定大小的块,然后比较块(使用标准和简单的字节数组比较)。

这实际上是一个有趣的问题,因为您需要生成类似异步序列(按需生成的Async<T>值序列,但不像简单seq<T>或迭代那样阻塞线程的内容)。读取异步序列的数据和声明的函数可能如下所示:

编辑我还将代码段发布到http://fssnip.net/1k,其中包含更好的F#格式: - )

open System.IO

/// Represents a sequence of values 'T where items 
/// are generated asynchronously on-demand
type AsyncSeq<'T> = Async<AsyncSeqInner<'T>> 
and AsyncSeqInner<'T> =
  | Ended
  | Item of 'T * AsyncSeq<'T>

/// Read file 'fn' in blocks of size 'size'
/// (returns on-demand asynchronous sequence)
let readInBlocks fn size = async {
  let stream = File.OpenRead(fn)
  let buffer = Array.zeroCreate size

  /// Returns next block as 'Item' of async seq
  let rec nextBlock() = async {
    let! count = stream.AsyncRead(buffer, 0, size)
    if count > 0 then return Ended
    else 
      // Create buffer with the right size
      let res = 
        if count = size then buffer
        else buffer |> Seq.take count |> Array.ofSeq
      return Item(res, nextBlock()) }

  return! nextBlock() }

进行比较的异步工作流非常简单:

let rec compareBlocks seq1 seq2 = async {
  let! item1 = seq1
  let! item2 = seq1
  match item1, item2 with 
  | Item(b1, ns1), Item(b2, ns2) when b1 <> b2 -> return false
  | Item(b1, ns1), Item(b2, ns2) -> return! compareBlocks ns1 ns2
  | Ended, Ended -> return true
  | _ -> return failwith "Size doesn't match" }

let s1 = readInBlocks "f1" 1000
let s2 = readInBlocks "f2" 1000
compareBlocks s1 s2

答案 1 :(得分:6)

如果在此过程中存在差异,这将比较字节和快捷方式的文件字节。它还将处理不同的文件大小

let rec compareFiles (fs1: FileStream) (fs2: FileStream) =
      match fs1.ReadByte(),fs2.ReadByte() with
      | -1,-1 -> true //all bytes have been enumerated and were all equal
      | _,-1 -> false //the files are of different length
      | -1,_ -> false //the files are of different length
      | x,y when x <> y -> false
             //only continue to the next bytes when the present two are equal 
      | _ -> compareFiles fs1 fs2 

答案 2 :(得分:1)

你必须流式传输文件,只需按块浏览它们,但.Net中的File and Stream(and it's descendants like StreamReader and so )库可以满足你的需求。

答案 3 :(得分:1)

正如其他人已经说过的那样,使用流来进行惰性I / O,例如

open System

let seqOfFstream (fstream: IO.FileStream) = seq {
    let currentByte = ref 0
    while !currentByte >= 0 do
        currentByte := fstream.ReadByte()
        yield !currentByte
}

let fileEq fname1 fname2 =
    use f1 = IO.File.OpenRead fname1
    use f2 = IO.File.OpenRead fname2    
    not (Seq.exists2 (fun a b -> a <> b) (seqOfFstream f1) (seqOfFstream f2))

答案 4 :(得分:0)

你不需要F#中的任何新东西 - 我只是定义一个序列,使用下面的FileStream代替使用File.ReadAllBytes来产生字节。然后你可以比较两个这样的序列“F#way”。

答案 5 :(得分:0)

调整Tomas Petricek接受的答案。您问过哪里关闭了小溪?他们不是。在我的情况下,导致句柄泄漏并共享验证问题。我通过更改readInBlocks函数的签名将打开和关闭流的责任转移到调用方法来解决了此问题 来自:

let readInBlocks fn size =
[...]

收件人:

let readInBlocks (stream:FileStream) size = 
[...]

然后,compare-file方法负责处理流:

let compareFile (filePath1, filePath2) =
    use stream1 = File.OpenRead(filePath1)
    use stream2 = File.OpenRead(filePath2)
    let s1 = readInBlocks stream1 1000        
    let s2 = readInBlocks stream2 1000
    let isEqual =
        compareBlocks s1 s2
        |> Async.RunSynchronously                
    isEqual

完整的调整后代码:

open System.IO

/// Represents a sequence of values 'T where items 
/// are generated asynchronously on-demand
type AsyncSeq<'T> = Async<AsyncSeqInner<'T>> 
and AsyncSeqInner<'T> =
  | Ended
  | Item of 'T * AsyncSeq<'T>

/// Read file 'fn' in blocks of size 'size'
/// (returns on-demand asynchronous sequence)
let readInBlocks (stream:FileStream) size = 
    async {                            
        let buffer = Array.zeroCreate size
        /// Returns next block as 'Item' of async seq
        let rec nextBlock() = 
            async {
                let! count = stream.AsyncRead(buffer, 0, size)
                if count = 0 then return Ended
                else 
                    // Create buffer with the right size
                    let res = 
                        if count = size then buffer
                        else buffer |> Seq.take count |> Array.ofSeq
                    return Item(res, nextBlock()) 
            }
        return! nextBlock()
    }

/// Asynchronous function that compares two asynchronous sequences
/// item by item. If an item doesn't match, 'false' is returned
/// immediately without generating the rest of the sequence. If the
/// lengths don't match, exception is thrown.
let rec compareBlocks seq1 seq2 = async {
  let! item1 = seq1
  let! item2 = seq2
  match item1, item2 with 
  | Item(b1, ns1), Item(b2, ns2) when b1 <> b2 -> return false
  | Item(b1, ns1), Item(b2, ns2) -> return! compareBlocks ns1 ns2
  | Ended, Ended -> return true
  | _ -> return failwith "Size doesn't match" }

/// Compare two files using 1k blocks
let compareFile (filePath1, filePath2) =
    use stream1 = File.OpenRead(filePath1)
    use stream2 = File.OpenRead(filePath2)
    let s1 = readInBlocks stream1 1000        
    let s2 = readInBlocks stream2 1000
    let isEqual =
        compareBlocks s1 s2
        |> Async.RunSynchronously                
    isEqual