因此,经过一些研究后,我能够找到将CSV文件转换为
所需的格式Subject,Start Date,Start Time,End Date,End Time,All Day Event,Description,Location,Private
问题是,我正在使用的CSV导出格式或顺序不正确,收集该信息的最佳方式是什么?这是我的一些来源。
名称,用户名,行类型,开始日期,开始时间,结束时间,结束日期,段开始日期,类型
“Smith,John J”,jjs,Shift,5/29 / 2011,9:30,17:30,5 / 29 / 2011,5 / 29/2011,常规
“Smith,John J”,jjs,Shift,5/30 / 2011,13:30,17:30,5 / 30 / 2011,5 / 30/2011,常规
Dim Name As String = ""
Dim UserName As String = ""
Dim Data As String = """Smith, John J"",jj802b,Shift,5/29/2011,9:30,17:30,5/29/2011,5/29/2011,Transfer"
For r As Integer = 1 To 10
Name = Data.Substring(0, Data.LastIndexOf(""""))
Data = Data.Remove(0, Data.LastIndexOf(""""))
UserName = Data.Substring(Data.LastIndexOf(""""), ",")
Next
答案 0 :(得分:3)
以下是解决方案
Dim Name As String = ""
Dim UserName As String = ""
Dim Data As String = """Smith, John J"",jj802b,Shift,5/29/2011,9:30,17:30,5/29/2011,5/29/2011,Transfer"
For r As Integer = 1 To 10
Dim DataArr() As String = DecodeCSV(Data) 'Use DecodeCSV function to regex split the string
Name = DataArr(0) 'Get First item of array as Name
UserName = DataArr(1) 'Get Second item of array as UserName
Next
DecodeCSV by Tim的优秀代码
Public Shared Function DecodeCSV(ByVal strLine As String) As String()
Dim strPattern As String
Dim objMatch As Match
' build a pattern
strPattern = "^" ' anchor to start of the string
strPattern += "(?:""(?<value>(?:""""|[^""\f\r])*)""|(?<value>[^,\f\r""]*))"
strPattern += "(?:,(?:[ \t]*""(?<value>(?:""""|[^""\f\r])*)""|(?<value>[^,\f\r""]*)))*"
strPattern += "$" ' anchor to the end of the string
' get the match
objMatch = Regex.Match(strLine, strPattern)
' if RegEx match was ok
If objMatch.Success Then
Dim objGroup As Group = objMatch.Groups("value")
Dim intCount As Integer = objGroup.Captures.Count
Dim arrOutput(intCount - 1) As String
' transfer data to array
For i As Integer = 0 To intCount - 1
Dim objCapture As Capture = objGroup.Captures.Item(i)
arrOutput(i) = objCapture.Value
' replace double-escaped quotes
arrOutput(i) = arrOutput(i).Replace("""""", """")
Next
' return the array
Return arrOutput
Else
Throw New ApplicationException("Bad CSV line: " & strLine)
End If
End Function
答案 1 :(得分:2)
根据CSV文件的确切内容和格式保证,为了提高速度和简便性,有时在,
上使用split
是解析文件的最简单,最快捷的方法。你的名字col包含一个,
,它不是一个分隔符,它增加了一点复杂性,尽管假设名称总是包含1 ,
,处理这种情况仍然很简单。
有解析CSV文件的库,这可能很有用。假设您不需要处理符合CSV规范的所有文件,我觉得它们有点过分。尽管如此,您可以使用以下regular expression轻松解析带有命名组的CSV文件以进行说服:
"(?<Name>[^"]+?)",(?<UserName>[^,]+?),(?<RowType>[^,]+?),(?<StartDate>[^,]+?),(?<StartTime>[^,]+?),(?<EndTime>[^,]+?),(?<EndDate>[^,]+?),(?<SegmentStartDate>[^,]+?),(?<Type>\w+)
这将创建命名捕获组,然后您可以使用它们输出到新的CSV文件,如下所示:
Dim ResultList As StringCollection = New StringCollection()
Try
Dim RegexObj As New Regex("""(?<Name>[^""]+?)"",(?<UserName>[^,]+?),(?<RowType>[^,]+?),(?<StartDate>[^,]+?),(?<StartTime>[^,]+?),(?<EndTime>[^,]+?),(?<EndDate>[^,]+?),(?<SegmentStartDate>[^,]+?),(?<Type>\w+)", RegexOptions.IgnoreCase)
Dim MatchResult As Match = RegexObj.Match(SubjectString)
While MatchResult.Success
'Append to new CSV file - MatchResult.Groups("groupname").Value
'Name = MatchResult.Groups("Name").Value
'Start Time = MatchResult.Groups("StartTime").Value
'End Time = MatchResult.Groups("EndTime").Value
'Etc...
End While
Catch ex As ArgumentException
'Syntax error in the regular expression
End Try
有关详细信息,请参阅MSDN上的.NET Framework Regular Expressions。
答案 2 :(得分:2)
这将是漫长的,对我来说是光秃秃的。
在开始之前,我想注意一些事项:
TextFieldParser
,你可以找到
在FileIO
命名空间下工作
使用输入CSV。这使得
读分隔文件要容易得多
而不是试图处理常规
表达式和你自己的解析,
等List(Of
Dictionary(Of String, String))
或a
与之相关的词典列表
字符串到其他字符串。实质上
这与此没什么不同
DataTable
和 ' Create a text parser object
Dim theParser As New FileIO.TextFieldParser("C:\Path\To\theInput.csv")
' Specify that fields are delimited by commas
theParser.Delimiters = {","}
' Specify that strings containing the delimiter are wrapped by quotes
theParser.HasFieldsEnclosedInQuotes = True
' Dimension containers for the field names and the list of data rows
' Initialize the field names with the first row r
Dim theInputFields As String() = theParser.ReadFields(),
theInputRows As New List(Of Dictionary(Of String, String))()
' While there is data to parse
Do While Not theParser.EndOfData
' Dimension a counter and a row container
Dim i As Integer = 0,
theRow As New Dictionary(Of String, String)()
' For each field
For Each value In theParser.ReadFields()
' Associate the value of that field for the row
theRow(theInputFields(i)) = value
' Increment the count
i += 1
Next
' Add the row to the list
theInputRows.Add(theRow)
Loop
' Close the input file for reading
theParser.Close()
' Dimension the list of output field names and a container for the list of formatted output rows
Dim theOutputFields As New List(Of String) From {"Subject", "Start Date", "Start Time", "End Date", "End Time", "All Day Event", "Description", "Location", "Private"},
theOutputRows As New List(Of Dictionary(Of String, String))()
' For each data row we've extracted from the CSV
For Each theRow In theInputRows
' Dimension a new formatted row for the output
Dim thisRow As New Dictionary(Of String, String)()
' For each field name of the output rows
For Each theField In theOutputFields
' Dimension a container for the value of this field
Dim theValue As String = String.Empty
' Specify ways to get the value of the field based on its name
' These are just examples; choose your own method for formatting the output
Select Case theField
Case "Subject"
' Output a subject "[Row Type]: [Name]"
theValue = theRow("Row Type") & ": " & theRow("Name")
Case "Description"
' Output a description from the input field [Type]
theValue = theRow("Type")
Case "Start Date", "Start Time", "End Date", "End Time"
' Output the value of the field with a correlated name
theValue = theRow(theField)
Case "All Day Event", "Private"
' Output False by default (you might want to change the case for Private
theValue = "False"
Case "Location"
' Can probably be safely left empty unless you'd like a default value
End Select
' Relate the value we've created to the column in this row
thisRow(theField) = theValue
Next
' Add the formatted row to the output data
theOutputRows.Add(thisRow)
Next
' Start building the first line by retriving the name of the first output field
Dim theHeader As String = theOutputFields.First
' For each of the remaining output fields
For Each theField In (From s In theOutputFields Skip 1)
' Append a comma and then the field name
theHeader = theHeader & "," & theField
Next
' Create a string builder to store the text for the output file, initialized with the header line and a line break
Dim theOutput As New System.Text.StringBuilder(theHeader & vbNewLine)
' For each row in the formatted output rows
For Each theRow In theOutputRows
' Dimension a container for this line of the file, beginning with the value of the column associated with the first output field
Dim theLine As String = theRow(theOutputFields.First)
' Wrap the first value if necessary
If theLine.Contains(",") Then theLine = """" & theLine & """"
' For each remaining output field
For Each theField In (From s In theOutputFields Skip 1)
' Dereference and store the associated column value
Dim theValue As String = theRow(theField)
' Add a comma and the value to the line, wrapped in quotations as needed
theLine = theLine & "," & If(theValue.Contains(","), """" & theValue & """", theValue)
Next
' Append the line to the output string
theOutput.AppendLine(theLine)
Next
' Write the formatted output to file
IO.File.WriteAllText("C:\output.csv", theOutput.ToString)
的访问模式
如果你对此感觉更舒服
构造,欢迎您使用它
代替。词典列表
表现完全相同并且要求
设置很少,所以在这里使用它
取而代之。我承认其中一些是硬编码的,但如果你需要概括程序,可以将某些方面移到应用程序设置和/或更好地分解功能。这里的重点是给你一个大致的想法。代码在下面注释:
Case
对于它的价值,使用您的示例数据似乎导致使用此代码在OpenOffice.org Calc中打开输出文件。您希望为字段输出的格式由您自己决定,因此请在Select
中修改相应的{{1}}语句,以便快速编码!
答案 3 :(得分:1)
This answer对类似的问题建议使用VB的TextFieldParser类,对我来说这似乎比滚动自己的csv解析器更好。 乍一看,您有所需的数据字段,开始和结束日期,其余的,除了可能主题和/或描述可以留空或填充默认/固定值...