从文件中删除不必要的换行符

时间:2015-12-03 14:18:17

标签: c# regex

我正在用C#代码读取输入文本文件。该文件的列分隔符为“|”和行分隔符为'\ n'。以下是测试数据 -

1001 | Name | XYZ | Department1 Roll no 1. (\r\n)
1002 | Name | ABC | Department2 Roll No 2. (\r\n)
1003 | Name | PQR | Department3 (\r\n)
Roll (\r\n)
no3. (\r\n)
1004 | Name | MNO | Department4 Roll No 4. (\r\n)
1005 | Name | DEF | Department5 Roll No 5. (\r\n)

前两个记录格式正确。但是,第三条记录插入错误。我想按照我的其他记录格式化它。

我为此编写了C#代码,如下所示 -

string text = File.ReadAllText(inputfile);
text = text.Replace(@"\r\n", " ");
File.WriteAllText(ouutputfile, text);

然而,它不适合我。任何人都可以帮我解决这个问题吗?

更多样本数据 - enter image description here

我们可以有一个正则表达式吗?

2 个答案:

答案 0 :(得分:1)

使用File.ReadAllLines反向处理,如Sergii所述。这将允许您检查每一行以查看它是否与预期格式匹配,或者是否由于换行不正确而创建了该行。如果当前行是错误放置的换行符的结果,那么您只需将它附加到前一行以获得结果输出。

static void ProcessFile(string inputfile, string outputfile)
{
    // Read the files by lines.
    string[] lines = File.ReadAllLines(inputfile);

    // We'll process in reverse, so create a stack (LIFO) for the results.
    Stack<string> results = new Stack<string>();

    // Process each line, checking that if it doesn't match the format, then we append to previous line.
    string resultLine = "";
    for (int i = lines.Length - 1; i >= 0; --i)
    {
        resultLine = lines[i] + resultLine;
        int lineParts = resultLine.Split('|').Count();
        if (lineParts == 4) // Well-formatted line.
        {
            results.Push(resultLine);
            resultLine = "";
        }
        else if (lineParts < 4) // An invalid linefeed from the previous entry.
        {
            // We prepend a space to replace the linebreak; then just continue through loop, where the current line will be appended to previous.
            resultLine = " " + resultLine;
        }
        else // lineParts > 4... unexpected
        {
            throw new InvalidOperationException("What to do here?");
        }
    }

    // Now that all our lines have been fixed, write them back out.
    File.WriteAllLines(outputfile, results.ToArray());
}

注意:这不是最有效的,因为您必须确保要处理的文件足够小,以便在内存中基本上适合3次,但这只是1次以上比你原来的解决方案。如果您的文件很大,您可能希望修改解决方案以对流进行操作,而不是将其全部保存在本地变量中。

答案 1 :(得分:0)

var text = File.ReadAllText(inputfile);
var rawParts = text.Split(new string[] { "\n" });
var proParts = new List<string>(rawParts.Take(2));
proParts.Add(rawParts[2] + " " rawParts[3] + " " rawParts[4]);
proParts.AddRange(rawParts.Skip(5));
var sb = new StringBuilder();
foreach (var part in proParts)
  sb.Append(part + "\n");
File.WriteAllText(outputfile, sb.ToString());