如何使用F#将文件读入内存中的集合?

时间:2015-02-16 12:22:16

标签: .net f# functional-programming streamreader

我希望尝试使用F#将逗号分隔的文件读入内存,将其重复一个字段,然后将结果写入以管道分隔的文件。

我写了一个关于我希望程序在C#中做什么的例子:

        var input = new StreamReader(@"D:\input.txt");
        var addresses = new Dictionary<string, AddressModel>();

        while (!input.EndOfStream)
        {
            var address = new AddressModel(input);
            if (!addresses.ContainsKey(address.Id))
                addresses.Add(address.Id, address);
        }

        var output = new StreamWriter(@"D:\CSharp.txt");
        foreach (var address in addresses.Values)
        {
            output.WriteLine(address.ToString());
        }

        output.Flush();

将AddressModel定义为:

    class AddressModel
    {
    public string Id { get; set; }
    public string StreetName { get; set; }
    public int ZipCode { get; set; }

    public AddressModel(StreamReader inputStream)
    {
        if (inputStream == null) return;

        var input = inputStream.ReadLine();
        if (input == null) return;
        var split = input.Split(new char[] { ',' }, StringSplitOptions.None);

        Id = split[0];
        ZipCode = int.Parse(split[1]);
        StreetName = BuildStreet(split);
    }

    private string BuildStreet(string[] items)
    {
        var street = "";
        if (!string.IsNullOrWhiteSpace(items[5]))
            street += items[5];
        if (!string.IsNullOrWhiteSpace(items[6]))
            street += string.IsNullOrWhiteSpace(street) ? items[6] : " " + items[6];
        if (!string.IsNullOrWhiteSpace(items[7]))
            street += string.IsNullOrWhiteSpace(street) ? items[7] : " " + items[7];
        if (!string.IsNullOrWhiteSpace(items[8]))
            street += string.IsNullOrWhiteSpace(street) ? items[8] : " " + items[8];
        return street;
    }

    public override string ToString()
    {
        return string.Format("{0}|{1}|{2}", Id, StreetName, ZipCode);
    }
}

所以我希望程序要做的是逐行读取文件,使用每一行构造一个新的AddressModel对象,看看这个项目是否已经存在于字典中,如果不存在则添加它,然后将此词典的内容写入第二个文本文件。

当然,如果我认为“过于面向对象”,并且我可以以更具功能性的方式做到这一点,如果有人能指出我正确的方向,我将不胜感激。

2 个答案:

答案 0 :(得分:3)

您可以像这样编写主程序:

open System 
let lines = IO.File.ReadLines @"D:\input.txt"
let addresses = new Dictionary<string, AddressModel>()
lines |> Seq.iter (fun line -> 
    let address = AddressModel line
    if not (addresses.ContainsKey address.Id) then
        addresses.Add (address.Id, address))
IO.File.WriteAllLines(@"D:\CSharp.txt", Seq.map string addresses.Values)

正如您所看到的那样,结构与C#中的结构没有什么不同,不同之处在于您可以使用更高阶函数,例如mapiter

然后关于你的Address类,你可以重用你的C#类或编写一个解析每一行的F#函数:

let parseLine (input:string) =
    let split = input.Split [|','|]
    let id, zipCode = split.[0], Int32.Parse split.[1]
    let street = 
        split.[5..8] 
        |> Array.filter (String.IsNullOrWhiteSpace >> not)
        |> String.concat " "
    (id, zipCode, street)

let printLine (id, zipCode, street) = sprintf "%s|%i|%s" id zipCode street

然后您可以像这样更新您的主要功能:

open System 
let lines = IO.File.ReadLines @"D:\input.txt"
let addresses = new Dictionary<string, (string*int*string)>()
lines |> Seq.map parseLine |> Seq.iter (fun ((id,_,_) as line) -> 
    if not (addresses.ContainsKey id) then
        addresses.Add (id, line))

IO.File.WriteAllLines(@"D:\CSharp.txt", Seq.map printLine addresses.Values)

现在你根本不需要字典步骤,如果它的唯一目的是获得不同的ID。您可以按照其他答案中的建议使用Seq.distinctBy。所以你的代码将进一步简化为:

let lines = 
    IO.File.ReadLines @"D:\input.txt"
    |> Seq.map parseLine 
    |> Seq.distinctBy (fun (id,_,_) -> id)

IO.File.WriteAllLines(@"D:\CSharp.txt", Seq.map printLine lines)

<强>更新

这是建议的最终代码:

open System 

let parseLine (input:string) =
    let split = input.Split [|','|]
    let id, zipCode = split.[0], Int32.Parse split.[1]
    let street = 
        split.[5..8] 
        |> Array.filter (String.IsNullOrWhiteSpace >> not)
        |> String.concat " "
    (id, zipCode, street)

let printLine (id, zipCode, street) = sprintf "%s|%i|%s" id zipCode street

let lines = 
    IO.File.ReadLines @"D:\input.txt"
    |> Seq.map parseLine 
    |> Seq.distinctBy (fun (id,_,_) -> id)

IO.File.WriteAllLines(@"D:\CSharp.txt", Seq.map printLine lines)

答案 1 :(得分:0)

您可以使用Seq.distinctBy内部使用的Dictionary

type Contact = {Id:string; Name:string}
let lines = File.ReadLines(@"D:\input.txt")

let output = 
        lines 
        |> Seq.map toContact
        |> Seq.distinctBy (fun c -> c.Id)
        |> Seq.map contactToStr

File.WriteAllLines(@"D:\CSharp.txt", output)

说你有一个联系人类型,一个从字符串构建联系人的函数(toContact)和一个从联系人类型(contactToStr)构建字符串的函数,例如:

let toContact (str:string) = 
        let values = str.Split(',')
        {Id = values.[0]; Name = values.[1]}
let contactToStr contact = sprintf "%s|%s" contact.Id contact.Name