StreamReader如何读取所有字符,包括0x0D 0x0A字符?
我有一个旧的.txt文件,我试图隐蔽。许多行(但不是全部)以“0x0D 0x0D 0x0A”结尾。
此代码读取所有行。
StreamReader srFile = new StreamReader(gstPathFileName);
while (!srFile.EndOfStream) {
string stFileContents = srFile.ReadLine();
...
}
这会在每个.txt行之间产生额外的“”字符串。由于段落之间有一些空行,删除所有“”字符串会删除这些空行。
有没有办法让StreamReader读取所有字符,包括“0x0D 0x0D 0x0A”?
两小时后编辑......文件很大,1.6MB。
答案 0 :(得分:1)
ReadLine
的一个非常简单的重新实现。我做了一个返回IEnumerable<string>
的版本,因为它更容易。我把它放在扩展方法中,所以static class
。代码被大量评论,所以应该很容易阅读。
public static class StreamEx
{
public static string[] ReadAllLines(this TextReader tr, string separator)
{
return tr.ReadLines(separator).ToArray();
}
// StreamReader is based on TextReader
public static IEnumerable<string> ReadLines(this TextReader tr, string separator)
{
// Handling of empty file: old remains null
string old = null;
// Read buffer
var buffer = new char[128];
while (true)
{
// If we already read something
if (old != null)
{
// Look for the separator
int ix = old.IndexOf(separator);
// If found
if (ix != -1)
{
// Return the piece of line before the separator
yield return old.Remove(ix);
// Then remove the piece of line before the separator plus the separator
old = old.Substring(ix + separator.Length);
// And continue
continue;
}
}
// old doesn't contain any separator, let's read some more chars
int read = tr.ReadBlock(buffer, 0, buffer.Length);
// If there is no more chars to read, break the cycle
if (read == 0)
{
break;
}
// Add the just read chars to the old chars
// note that null + "somestring" == "somestring"
old += new string(buffer, 0, read);
// A new "round" of the while cycle will search for the separator
}
// Now we have to handle chars after the last separator
// If we read something
if (old != null)
{
// Return all the remaining characters
yield return old;
}
}
}
请注意,正如所写,它不会直接处理您的问题:-)但它允许您选择要使用的分隔符。因此,您使用"\r\n"
,然后修剪多余的'\r'
。
像这样使用:
using (var sr = new StreamReader("somefile"))
{
// Little LINQ to strip excess \r and to make an array
// (note that by making an array you'll put all the file
// in memory)
string[] lines = sr.ReadLines("\r\n").Select(x => x.TrimEnd('\r')).ToArray();
}
或
using (var sr = new StreamReader("somefile"))
{
// Little LINQ to strip excess \r
// (note that the file will be read line by line, so only
// a line at a time is in memory (plus some remaining characters
// of the next line in the old buffer)
IEnumerable<string> lines = sr.ReadLines("\r\n").Select(x => x.TrimEnd('\r'));
foreach (string line in lines)
{
// Do something
}
}
答案 1 :(得分:0)
您总是可以使用BinaryReader
并一次手动读取一行字节。保持字节,然后当遇到0x0d 0x0d 0x0a
时,为当前行创建一个新的字节字符串。
注意:
Encoding.UTF8
,但你的情况可能会有所不同。直接访问字节,我不知道如何解释编码。这是:
public static IEnumerable<string> ReadLinesFromStream(string fileName)
{
using ( var fileStream = File.Open(gstPathFileName) )
using ( BinaryReader binaryReader = new BinaryReader(fileStream) )
{
var bytes = new List<byte>();
while ( binaryReader.PeekChar() != -1 )
{
bytes.Add(binaryReader.ReadByte());
bool newLine = bytes.Count > 2
&& bytes[bytes.Count - 3] == 0x0d
&& bytes[bytes.Count - 2] == 0x0d
&& bytes[bytes.Count - 1] == 0x0a;
if ( newLine )
{
yield return Encoding.UTF8.GetString(bytes.Take(bytes.Count - 3).ToArray());
bytes.Clear();
}
}
if ( bytes.Count > 0 )
yield return Encoding.UTF8.GetString(bytes.ToArray());
}
}
答案 2 :(得分:0)
此代码效果很好...读取每个字符。
char[] acBuf = null;
int iReadLength = 100;
while (srFile.Peek() >= 0) {
acBuf = new char[iReadLength];
srFile.Read(acBuf, 0, iReadLength);
string s = new string(acBuf);
}
答案 3 :(得分:0)
一个非常简单的解决方案(未针对内存消耗进行优化)可能是:
var allLines = File.ReadAllText(gstPathFileName)
.Split('\n');
如果需要删除尾随回车字符,请执行以下操作:
for(var i = 0; i < allLines.Length; ++i)
allLines[i] = allLines[i].TrimEnd('\r');
如果需要,您可以将相关处理放入for
链接中。或者,如果您不想保留数组,请使用此代替for
:
foreach(var line in allLines.Select(x => x.TrimEnd('\r')))
{
// use 'line' here ...
}