Question

我有一个txt文件格式为：

0.32423 1.3453 3.23423
0.12332 3.1231 9.23432432
9.234324234 -1.23432 12.23432
...

每一行都有三个双倍值。此文件中有超过10000行。我可以使用ReadStream.ReadLine并使用String.Split，然后转换它。我想知道有没有更快的方法来做到这一点。

最诚挚的问候，

Answer 1

StreamReader.ReadLine，String.Split和Double.TryParse听起来像是一个很好的解决方案无需改进。

Answer 2

您可以执行一些微观优化，但您建议的方式听起来很简单。

10000行不应该花很长时间 - 你试过它并发现你确实遇到了性能问题吗？例如，这里有两个简短的程序 - 一个创建10,000行文件，另一个读取它：

CreateFile.cs：

using System;
using System.IO;

public class Test
{
    static void Main()
    {
        Random rng = new Random();
        using (TextWriter writer = File.CreateText("test.txt"))
        {
            for (int i = 0; i < 10000; i++)
            {
                writer.WriteLine("{0} {1} {2}", rng.NextDouble(),
                                 rng.NextDouble(), rng.NextDouble());
            }
        }
    }
}

ReadFile.cs：

using System;
using System.Diagnostics;
using System.IO;
using System.Linq;

public class Test
{
    static void Main()
    {   
        Stopwatch sw = Stopwatch.StartNew();
        using (TextReader reader = File.OpenText("test.txt"))
        {
            string line;
            while ((line = reader.ReadLine()) != null)
            {
                string[] bits = line.Split(' ');
                foreach (string bit in bits)
                {
                    double value;
                    if (!double.TryParse(bit, out value))
                    {
                        Console.WriteLine("Bad value");
                    }
                }
            }
        }
        sw.Stop();
        Console.WriteLine("Total time: {0}ms",
                          sw.ElapsedMilliseconds);
    }
}

在我的上网本（确实有一个SSD）中，只需要82毫秒来读取文件。我建议这可能不是问题：）

Answer 3

我建议您使用

一次阅读所有行

string[] lines = System.IO.File.ReadAllLines(fileName);

这将确保I / O以最高效率完成。你必须测量（配置文件），但我希望转换时间要少得多。

Answer 4

你的方法已经很好了！

您可以通过编写返回double数组的readline函数来改进它，并在其他程序中重用此函数。

Answer 5

这个解决方案有点慢（最后看基准测试），但阅读效果更好。它也应该是更高的内存效率，因为当时只缓冲当前字符（而不是整个文件或行）。

读取数组是本读者的一个附加功能，它假设数组的大小始终作为int值出现。

IParsable是另一项功能，可以轻松实现各种类型的Parse方法。

class StringSteamReader {
    private StreamReader sr;

    public StringSteamReader(StreamReader sr) {
        this.sr = sr;
        this.Separator = ' ';
    }

    private StringBuilder sb = new StringBuilder();
    public string ReadWord() {
        eol = false;
        sb.Clear();
        char c;
        while (!sr.EndOfStream) {
            c = (char)sr.Read();
            if (c == Separator) break;
            if (IsNewLine(c)) {
                eol = true;
                char nextch = (char)sr.Peek();
                while (IsNewLine(nextch)) {
                    sr.Read(); // consume all newlines
                    nextch = (char)sr.Peek();
                }
                break;
            }
            sb.Append(c);
        }
        return sb.ToString();
    }

    private bool IsNewLine(char c) {
        return c == '\r' || c == '\n';
    }

    public int ReadInt() {
        return int.Parse(ReadWord());
    }

    public double ReadDouble() {
        return double.Parse(ReadWord());
    }

    public bool EOF {
        get { return sr.EndOfStream; }
    }

    public char Separator { get; set; }

    bool eol;
    public bool EOL {
        get { return eol || sr.EndOfStream; }
    }

    public T ReadObject<T>() where T : IParsable, new() {
        var obj = new T();
        obj.Parse(this);
        return obj;
    }

    public int[] ReadIntArray() {
        int size = ReadInt();
        var a = new int[size];
        for (int i = 0; i < size; i++) {
            a[i] = ReadInt();
        }
        return a;
    }

    public double[] ReadDoubleArray() {
        int size = ReadInt();
        var a = new double[size];
        for (int i = 0; i < size; i++) {
            a[i] = ReadDouble();
        }
        return a;
    }

    public T[] ReadObjectArray<T>() where T : IParsable, new() {
        int size = ReadInt();
        var a = new T[size];
        for (int i = 0; i < size; i++) {
            a[i] = ReadObject<T>();
        }
        return a;
    }

    internal void NextLine() {
        eol = false;
    }
}

interface IParsable {
    void Parse(StringSteamReader r);
}

可以像这样使用：

public void Parse(StringSteamReader r) {
    double x = r.ReadDouble();
    int y = r.ReadInt();
    string z = r.ReadWord();
    double[] arr = r.ReadDoubleArray();
    MyParsableObject o = r.ReadObject<MyParsableObject>();
    MyParsableObject [] oarr = r.ReadObjectArray<MyParsableObject>();
}

我做了一些基准测试，将StringStreamReader与已经提出的其他方法进行了比较（StreamReader.ReadLine和File.ReadAllLines）。以下是我用于基准测试的方法：

private static void Test_StringStreamReader(string filename) {
    var sw = new Stopwatch();
    sw.Start();
    using (var sr = new StreamReader(new FileStream(filename, FileMode.Open, FileAccess.Read))) {
        var r = new StringSteamReader(sr);
        r.Separator = ' ';
        while (!r.EOF) {
            var dbls = new List<double>();
            while (!r.EOF) {
                dbls.Add(r.ReadDouble());
            }
        }
    }
    sw.Stop();
    Console.WriteLine("elapsed: {0}", sw.Elapsed);
}

private static void Test_ReadLine(string filename) {
    var sw = new Stopwatch();
    sw.Start();
    using (var sr = new StreamReader(new FileStream(filename, FileMode.Open, FileAccess.Read))) {
        var dbls = new List<double>();

        while (!sr.EndOfStream) {
            string line = sr.ReadLine();
            string[] bits = line.Split(' ');
            foreach(string bit in bits) {
                dbls.Add(double.Parse(bit));
            }
        }
    }
    sw.Stop();
    Console.WriteLine("elapsed: {0}", sw.Elapsed);
}

private static void Test_ReadAllLines(string filename) {
    var sw = new Stopwatch();
    sw.Start();
    string[] lines = System.IO.File.ReadAllLines(filename);
    var dbls = new List<double>();
    foreach(var line in lines) {
        string[] bits = line.Split(' ');
        foreach (string bit in bits) {
            dbls.Add(double.Parse(bit));
        }
    }        
    sw.Stop();
    Console.WriteLine("Test_ReadAllLines: {0}", sw.Elapsed);
}

我使用了一个带有1.000.000行双值的文件（每行3个值）。文件位于SSD磁盘上，每个测试在发布模式下重复多次。这些是结果（平均）：

Test_StringStreamReader: 00:00:01.1980975
Test_ReadLine:           00:00:00.9117553
Test_ReadAllLines:       00:00:01.1362452

因此，如上所述，StringStreamReader比其他方法慢一点。对于10.000行，性能大约为（120ms / 95ms / 100ms）。

从文件C＃中读取double值

5 个答案: