是否有低成本方法来测试文件中第一行的LF终结符而不是CRLF?
我们收到了很多来自客户的文件,其中一些文件将我们的EOL终结器发送给LF而不是CRLF。我们正在使用SSIS导入,所以我需要行终止符相同。 (当我在Notepad ++中打开文件时,我可以看到以LF而不是CRLF结束的行)
如果我将文件的第一行读入StreamReader ReadLine,该行看起来不包含任何类型的终结符。我测试了line.Contains(vbLf)和vbCr以及vbCrLf,所有都回来了。
我想我可以将整个文件读入内存并测试vbLf,但是我们收到的一些文件非常大(25MB),并且看起来像是在查看第一行中的行终止符时浪费了大量资源。最糟糕的情况是,我可以用行+ System.Environment.NewLine重写我们收到的每个文件中的每一行,但同样又浪费已经使用CRLF的文件。
编辑下面的最终代码基于@icemanind的答案(SSIS脚本任务传入目录变量)
Public Sub Main()
'Gets the directory and a listing of the files and calls the sub
Dim sPath As String
sPath = Dts.Variables("User::DataSourceDir").Value.ToString
Dim sDirectory As String = sPath
Dim dirList As New DirectoryInfo(sDirectory)
Dim fileList As FileInfo() = dirList.GetFiles()
For Each fileName As FileInfo In fileList
ReplaceBadEol(fileName)
Next
Dts.TaskResult = ScriptResults.Success
End Sub
'Temp filename postfix
Private Const fileNamePostFix As String = "_Temp.txt"
'Tests to see if the file has a valid end of line terminator and fixes if it doesn't
Private Sub ReplaceBadEol(currentFileInfo As FileInfo)
Dim fullName As String = currentFileInfo.FullName
If FirstLineEndsWithCrLf(fullName) Then Exit Sub
Dim fileContent As String() = GetFileContent(currentFileInfo.FullName)
Dim pureFileName As String = Path.GetFileNameWithoutExtension(fullName)
Dim newFileName As String = Path.Combine(currentFileInfo.DirectoryName, pureFileName & fileNamePostFix)
File.WriteAllLines(newFileName, fileContent)
currentFileInfo.Delete()
File.Move(newFileName, fullName)
End Sub
'Enum to provide info on the return
Private Enum Terminators
None = 0
CrLf = 1
Lf = 2
Cr = 3
End Enum
'Eol test reads file, advances to the end of the first line and evaluates the value
Private Function GetTerminator(fileName As String, length As Integer) As Terminators
Using sr As New StreamReader(fileName)
sr.BaseStream.Seek(length, SeekOrigin.Begin)
Dim data As Integer = sr.Read()
While data <> -1
If data = 13 Then
data = sr.Read()
If data = 10 Then
Return Terminators.CrLf
End If
Return Terminators.Cr
End If
If data = 10 Then
Return Terminators.Lf
End If
data = sr.Read()
End While
End Using
Return Terminators.None
End Function
'Checks if file is empty, if not check for EOL terminator
Private Function FirstLineEndsWithCrLf(fileName As String) As Boolean
Using reader As New System.IO.StreamReader(fileName)
Dim line As String = reader.ReadLine()
Dim length As Integer = line.Length
Dim fileEmpty As Boolean = String.IsNullOrWhiteSpace(line)
If fileEmpty = True Then
Return True
Else
If GetTerminator(fileName, length) <> 1 Then
Return False
End If
Return True
End If
End Using
End Function
'Reads all lines into String Array
Private Function GetFileContent(fileName As String) As String()
Return File.ReadAllLines(fileName)
End Function
答案 0 :(得分:2)
你的行测试VbCrLf,VbLf和VbCr为负的原因是因为ReadLine剥离了这些。来自StreamReader.ReadLine文档:
A line is defined as a sequence of characters followed by a line feed ("\n"),
a carriage return ("\r"), or a carriage return immediately followed by a line
feed ("\r\n"). The string that is returned does not contain the terminating
carriage return or line feed.
如果你想要所有的行,用回车连接,试试这个:
Dim lines As String() = File.ReadAllLines("myfile.txt")
Dim data As String = lines.Aggregate(Function(i, j) i + VbCrLf + j)
这将读取文件的所有行,然后使用一些Linq将它们全部连接到回车符和换行符。
修改强>
如果您只想确定第一个换行符是什么,请尝试以下函数:
Private Enum Terminators
None = 0
CrLf = 1
Lf = 2
Cr = 3
End Enum
Private Shared Function GetTerminator(fileName As String) As Terminators
Using sr = New StreamReader(fileName)
Dim data As Integer = sr.Read()
While data <> -1
If data = 13 Then
data = sr.Read()
If data = 10 Then
Return Terminators.CrLf
End If
Return Terminators.Cr
End If
If data = 10 Then
Return Terminators.Lf
End If
data = sr.Read()
End While
End Using
Return Terminators.None
End Function
只需调用此函数,传入一个文件名,它将返回&#34; Cr&#34;,&#34; Lf&#34;,&#34; CrLf&#34;或&#34;无&#34;如果没有行终止符。