我的CSV文件如下所示:
"Name1", "A test, which "fails" all the time"
"Name2", "A test, which "fails" all the time"
"Name3", "A test, which "fails" all the time"
我的代码是:
Using parser As New FileIO.TextFieldParser(filepath)
parser.Delimiters = New String() {","}
parser.HasFieldsEnclosedInQuotes = True
parser.TrimWhiteSpace = False
Dim currentRow As String()
While Not parser.EndOfData
Try
currentRow = parser.ReadFields()
Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
MsgBox("Line " & ex.Message &
"is not valid and will be skipped.")
Finally
End Try
End While
End Using
我得到的错误是无法使用当前的分隔符传递第1行。无效,将被跳过。 起初,我认为逗号是问题,但看起来问题是引号内的引号
任何想法如何阅读?
PS。我的代码所面对的文件通常在引号内没有引号,所以我正在寻找一种快速,可靠,通用的方式来读取文件。 从我读到的结果来看,正则表达式非常重要。
答案 0 :(得分:0)
此文件包含无效的CSV,通常无法解析。所以你应该修复“乱七八糟”的来源。但是,如果你不能这样做,你可以写一个试图解决它的方法:
Function FixRowFieldsQuoteIssue(parser As TextFieldParser) As String()
If Not parser.HasFieldsEnclosedInQuotes Then Return Nothing 'method fixes quote issue
Dim errorLine As String = parser.ErrorLine
If String.IsNullOrWhiteSpace(errorLine) Then Return Nothing ' empty line no quote issue
errorLine = errorLine.Trim()
If Not errorLine.StartsWith("""") Then Return Nothing ' must start with quote otherwise fix not supported
Dim lineFields As New List(Of String)
Dim insideField As Boolean = False
Dim currentField As New List(Of Char)
For i As Int32 = 0 To errorLine.Length - 1
Dim c As Char = errorLine(i)
Dim isDelimiter = parser.Delimiters.Contains(c)
Dim isQuote = c = """"
If insideField Then
If isQuote Then
If i = errorLine.Length - 1 OrElse
parser.Delimiters.Contains(errorLine(i + 1)) Then
' delimiter follows, this is a valid end field quote
' can be improved by skipping spaces until delimiter
insideField = False
lineFields.Add(String.Concat(currentField))
currentField = New List(Of Char)
Else
' next char not a delimiter, this is invalid
' add this quote to regular field-chars to fix it
currentField.Add(c)
End If
Else
' regular char, add it to the current field chars
currentField.Add(c)
End If
ElseIf isQuote Then
insideField = True
End If
Next
Return lineFields.ToArray()
End Function
从Catch
:
Dim allRowFields As New List(Of String())
Using parser As New FileIO.TextFieldParser("filePath")
parser.Delimiters = New String() {","}
parser.HasFieldsEnclosedInQuotes = True
parser.TrimWhiteSpace = False
While Not parser.EndOfData
Try
Dim currentRowFields As String() = parser.ReadFields()
allRowFields.Add(currentRowFields)
Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
Dim fixedFields As String() = FixRowFieldsQuoteIssue(parser)
If fixedFields IsNot Nothing Then
allRowFields.Add(fixedFields)
Else
MsgBox("Line " & ex.Message & "Is Not valid And will be skipped.")
End If
End Try
End While
End Using
答案 1 :(得分:0)
由于CSV数据格式不正确,您需要手动解析数据。幸运的是,因为你只有两个字段而且第一个字段不包含无效格式,你可以通过简单地获取逗号的第一个实例的索引并将这些字段分开来实现。
这是一个简单的例子:
Private Function Parse_CSV(ByVal csv As String) As DataTable
'Create a new instance of a DataTable and create the two columns
Dim dt As DataTable = New DataTable("CSV")
dt.Columns.AddRange({New DataColumn("Column1"), New DataColumn("Column2")})
'Placeholder variable for the separator
Dim separator As Integer = -1
'Iterate through each line in the data
For Each line As String In csv.Split({Environment.NewLine}, StringSplitOptions.None)
'Get the first instance of a comma
separator = line.IndexOf(","c)
'Check to make sure the data has two fields
If separator = -1 Then
Throw New MissingFieldException("The current line is missing a separator: " & line)
ElseIf separator = line.Length - 1 Then
Throw New MissingFieldException("The separator cannot appear at the end of the line, this is occuring at: " & line)
Else
'Add the two fields to the datatable(getting rid of the starting and ending quotes)
dt.Rows.Add({line.Substring(0, separator), line.Substring(separator + 2)})
End If
Next
'Return the data
Return dt
End Function
小提琴:Live Demo
答案 2 :(得分:0)
这会将您的CSV拆分为2列,并在内部留下引号。 将xline替换为CSV的1行
Dim xdata As New List(Of KeyValuePair(Of String, String))
Dim xline As String = """Name3"", ""A test, which ""fails"" all the time"""
Dim FirstCol As Integer = Strings.InStr(xline, ",")
xdata.Add(New KeyValuePair(Of String, String)(Strings.Left(xline, FirstCol - 1).Replace(Chr(34), ""), Strings.Mid(xline, FirstCol + 2).Remove(0, 1).Remove(Strings.Mid(xline, FirstCol + 2).Remove(0, 1).Length - 1, 1)))
答案 3 :(得分:0)
您可以尝试使用Cinchoo ETL - 一个开源库来读取和写入CSV文件。
您可以通过多种方式解析文件
方法1:指定列名
using (var parser = new ChoCSVReader("NestedQuotes.csv")
.WithFields("name", "desc")
)
{
foreach (dynamic x in parser)
Console.WriteLine(x.name + "-" + x.desc);
}
方法2:按索引访问(不指定列名)
using (var parser = new ChoCSVReader("NestedQuotes.csv"))
{
foreach (dynamic x in parser)
Console.WriteLine(x[0] + "-" + x[1]);
}
希望它有所帮助。
有关更多帮助,请阅读以下codeproject文章。 https://www.codeproject.com/Articles/1145337/Cinchoo-ETL-CSV-Reader