我在C#代码中使用string.split()来读取制表符分隔文件。我正面临着代码示例中提到的“OutOfMemory异常”。
在这里,我想知道为什么文件大小为16 MB会出现问题?
这是正确的方法吗?
using (StreamReader reader = new StreamReader(_path))
{
//...........Load the first line of the file................
string headerLine = reader.ReadLine();
MeterDataIPValueList objMeterDataList = new MeterDataIPValueList();
string[] seperator = new string[1]; //used to sepreate lines of file
seperator[0] = "\r\n";
//.............Load Records of file into string array and remove all empty lines of file.................
string[] line = reader.ReadToEnd().Split(seperator, StringSplitOptions.RemoveEmptyEntries);
int noOfLines = line.Count();
if (noOfLines == 0)
{
mFileValidationErrors.Append(ConstMsgStrings.headerOnly + Environment.NewLine);
}
//...............If file contains records also with header line..............
else
{
string[] headers = headerLine.Split('\t');
int noOfColumns = headers.Count();
//.........Create table structure.............
objValidateRecordsTable.Columns.Add("SerialNo");
objValidateRecordsTable.Columns.Add("SurveyDate");
objValidateRecordsTable.Columns.Add("Interval");
objValidateRecordsTable.Columns.Add("Status");
objValidateRecordsTable.Columns.Add("Consumption");
//........Fill objValidateRecordsTable table by string array contents ............
int recordNumber; // used for log
#region ..............Fill objValidateRecordsTable.....................
seperator[0] = "\t";
for (int lineNo = 0; lineNo < noOfLines; lineNo++)
{
recordNumber = lineNo + 1;
**string[] recordFields = line[lineNo].Split(seperator, StringSplitOptions.RemoveEmptyEntries);** // Showing me error when we split columns
if (recordFields.Count() == noOfColumns)
{
//Do processing
}
答案 0 :(得分:12)
Split的实现很差,并且在应用于大字符串时会出现严重的性能问题。请参阅this article for details on memory requirements by split function:
当您对包含1355049个逗号分隔的字符串的字符串进行拆分时会发生什么情况,每个字符串包含16个字符,总字符长度为25745930?
指向字符串对象的指针数组:连续虚拟地址空间为4(地址指针)* 1355049 = 5420196(数组大小)+ 16(用于簿记)= 5420212.
1355049个字符串的非连续虚拟地址空间,每个字符串54个字节。这并不意味着所有这130万个字符串都会分散在整个堆中,但它们不会在LOH上分配。 GC将在Gen0堆上的串上分配它们。
- 醇>
Split.Function将创建大小为25745930的System.Int32 []的内部数组,消耗(102983736字节)~98MB的LOH,这是非常昂贵的L.
答案 1 :(得分:10)
首先尝试不将整个文件读入数组“reader.ReadToEnd()”直接逐行读取文件..
using (StreamReader sr = new StreamReader(this._path))
{
string line = "";
while(( line= sr.ReadLine()) != null)
{
string[] cells = line.Split(new string[] { "\t" }, StringSplitOptions.None);
if (cells.Length > 0)
{
}
}
}
答案 2 :(得分:4)
如果可以的话,我建议逐行阅读,但有时不按新行划分。
因此,您可以随时编写自己的内存效率分割。这解决了我的问题。
private static IEnumerable<string> CustomSplit(string newtext, char splitChar)
{
var result = new List<string>();
var sb = new StringBuilder();
foreach (var c in newtext)
{
if (c == splitChar)
{
if (sb.Length > 0)
{
result.Add(sb.ToString());
sb.Clear();
}
continue;
}
sb.Append(c);
}
if (sb.Length > 0)
{
result.Add(sb.ToString());
}
return result;
}
答案 3 :(得分:3)
我用自己的。它已通过10次单元测试进行了测试..
public static class StringExtensions
{
// the string.Split() method from .NET tend to run out of memory on 80 Mb strings.
// this has been reported several places online.
// This version is fast and memory efficient and return no empty lines.
public static List<string> LowMemSplit(this string s, string seperator)
{
List<string> list = new List<string>();
int lastPos = 0;
int pos = s.IndexOf(seperator);
while (pos > -1)
{
while(pos == lastPos)
{
lastPos += seperator.Length;
pos = s.IndexOf(seperator, lastPos);
if (pos == -1)
return list;
}
string tmp = s.Substring(lastPos, pos - lastPos);
if(tmp.Trim().Length > 0)
list.Add(tmp);
lastPos = pos + seperator.Length;
pos = s.IndexOf(seperator, lastPos);
}
if (lastPos < s.Length)
{
string tmp = s.Substring(lastPos, s.Length - lastPos);
if (tmp.Trim().Length > 0)
list.Add(tmp);
}
return list;
}
}
答案 4 :(得分:1)
尝试按行读取文件,而不是拆分整个内容。