我有一个大字符串分隔的文本文件(不是单字符分隔),如下所示:
第一个数据[STRING-SEPERATOR]第二个数据[STRING-SEPERATOR] ......
我不想将整个文件加载到内存中,因为它的大小(~250MB)。如果我使用System.IO.File.ReadAllText
阅读整个文件,我会获得OutOfMemoryException
。
因此,我希望在[STRING-SEPERATOR]
的第一次出现之前读取该文件,然后继续下一个字符串。它喜欢"采取" first data
关闭文件,处理它并继续使用second data
,它现在是文件的第一个数据。
System.IO.StreamReader.ReadLine()
对我没有帮助,因为该文件的内容是一行。
您是否知道如何读取文件直到.NET中的某个字符串?
我希望有些想法,谢谢。
答案 0 :(得分:1)
这应该对你有帮助。
private IEnumerable<string> ReadCharsByChunks(int chunkSize, string filePath)
{
using (FileStream fs = new FileStream(filePath, FileMode.Open))
{
byte[] buffer = new byte[chunkSize];
int currentRead;
while ((currentRead = fs.Read(buffer, 0, chunkSize)) > 0)
{
yield return Encoding.Default.GetString(buffer, 0, currentRead);
}
}
}
private void SearchWord(string searchWord)
{
StringBuilder builder = new StringBuilder();
foreach (var chars in ReadCharsByChunks(2, "sample.txt"))//Can be any number
{
builder.Append(chars);
var existing = builder.ToString();
int foundIndex = -1;
if ((foundIndex = existing.IndexOf(searchWord)) >= 0)
{
//Found
MessageBox.Show("Found");
builder.Remove(0, foundIndex + searchWord.Length);
}
else if (!existing.Contains(searchWord.First()))
{
builder.Clear();
}
}
}
答案 1 :(得分:0)
文本文件也可以按字符方式读取,如this questions中所述。要搜索某个字符串,您必须使用一些手动实现的逻辑,该逻辑可以根据字符输入搜索所需的字符串,这可以通过状态机完成。
答案 2 :(得分:0)
StreamReader.Read有一些可能对你有帮助的重载。 试试这个:
int index, count;
index = 0;
count = 200; // or whatever number you think is better
char[] buffer = new char[count];
System.IO.StreamReader sr = new System.IO.StreamReader("Path here");
while (sr.Read(buffer, index, count) > 0) {
/*
check if buffer contains your string seperator, or at least some part of it
if it contains a part of it, you need check the rest of the stream to make sure it's a real seporator
do your stuff, set the index to one character after the last seporator.
*/
}
答案 3 :(得分:0)
感谢您的回复。这是我在VB.NET中编写的函数:
Public Function ReadUntil(Stream As System.IO.FileStream, UntilText As String) As String
Dim builder As New System.Text.StringBuilder()
Dim returnTextBuilder As New System.Text.StringBuilder()
Dim returnText As String = String.Empty
Dim size As Integer = CInt(UntilText.Length / 2) - 1
Dim buffer(size) As Byte
Dim currentRead As Integer = -1
Do Until currentRead = 0
Dim collected As String = Nothing
Dim chars As String = Nothing
Dim foundIndex As Integer = -1
currentRead = Stream.Read(buffer, 0, buffer.Length)
chars = System.Text.Encoding.Default.GetString(buffer, 0, currentRead)
builder.Append(chars)
returnTextBuilder.Append(chars)
collected = builder.ToString()
foundIndex = collected.IndexOf(UntilText)
If (foundIndex >= 0) Then
returnText = returnTextBuilder.ToString()
Dim indexOfSep As Integer = returnText.IndexOf(UntilText)
Dim cutLength As Integer = returnText.Length - indexOfSep
returnText = returnText.Remove(indexOfSep, cutLength)
builder.Remove(0, foundIndex + UntilText.Length)
If (cutLength > UntilText.Length) Then
Stream.Position = Stream.Position - (cutLength - UntilText.Length)
End If
Return returnText
ElseIf (Not collected.Contains(UntilText.First())) Then
builder.Length = 0
End If
Loop
Return String.Empty
End Function