使用crlf行分隔符读取BIG文本文件的最佳方法

时间:2016-05-10 12:56:52

标签: c# asp.net regex readline

我有一个非常大的逗号分隔文本文件。如上所述,每个字段由逗号分隔并用引号(所有字符串)包围。问题是某些字段包含该字段内多行的CR。因此,当我执行ReadLine时,它会在该CR处停止。如果我能告诉它只停止CRLF组合,那将是很好的。

有没有人有任何snappy方法来做到这一点?文件可能非常大。

2 个答案:

答案 0 :(得分:2)

如果您想要特定的ReadLine,为什么不实施它?

  public static class MyFileReader {
    public static IEnumerable<String> ReadLineCRLF(String path) {
      StringBuilder sb = new StringBuilder();

      Char prior = '\0';
      Char current = '\0';

      using (StreamReader reader = new StreamReader(path)) {
        int v = reader.Read();

        if (v < 0) {
          if (prior == '\r')
            sb.Append(prior);

          yield return sb.ToString();

          yield break;
        }

        prior = current;
        current = (Char) v;

        if ((current == '\n') && (prior == '\r')) {
          yield return sb.ToString();

          sb.Clear();
        }
        else if (current == '\r') {
          if (prior == '\r')
            sb.Append(prior);
        }
        else
          sb.Append(current);
      }
    }
  }

然后使用它

  var lines = MyFileReader
    .ReadLineCRLF(@"C:\MyData.txt"); 

答案 1 :(得分:1)

如何使用

string line = File.ReadAllText("input.txt"); // Read the text in one line

然后将其拆分为回车/换行符,如下所示:

var split = line.Split('\n'); // I'm not really sure it's \n you'll need, but it's something!

然后在循环中逐行处理

foreach(var line in split) { ... }