任何人都可以帮我指出正确的方向吗?提前谢谢。
我希望制作一个小型应用程序来处理csv文件,方法是根据数组中其中一列(例如Header1)的不同值列表将行输出到多个csv文件中,但是,我不知道从哪里开始。仅供参考:标题中的列表将始终更改。
我已经能够使用以下代码将文件读入数组:
[Read From Comma-Delimited Text Files in Visual Basic][1]
现在我想根据第一列处理数据。例如;
INPUT:
input.csv
"Header1","Header2","Header3","Header4"
"apple","pie","soda","beer"
"apple","cake","milk","wine"
"pear","pie","soda","beer"
"pear","pie","soda","beer"
"orange","pie","soda","beer"
"orange","pie","soda","beer"
输出:
output1.csv
"Header1","Header2","Header3","Header4"
"apple","pie","soda","beer"
"apple","cake","milk","wine"
output2.csv
"Header1","Header2","Header3","Header4"
"pear","pie","soda","beer"
"pear","pie","soda","beer"
output2.csv
"Header1","Header2","Header3","Header4"
"orange","pie","soda","beer"
"orange","pie","soda","beer"
答案 0 :(得分:0)
你能做的是
实施例
Dim lines As String() = System.IO.File.ReadAllLines("input.csv")
Dim q = (From line In lines
Let x = line.Split(",")
Select x(0)).ToList()
Dim dist = q.Distinct().ToList()
For j As Integer = 1 To dist.Count - 1
Using sw As New StreamWriter(File.Open("output" & j & ".csv", FileMode.OpenOrCreate))
sw.WriteLine(lines(0))
End Using
Next
For i As Integer = 1 To q.Count - 1
Console.WriteLine(q(i))
Console.WriteLine(dist.IndexOf(q(i)))
Using sw As New StreamWriter(File.Open("output" & dist.IndexOf(q(i)) & ".csv", FileMode.Append))
sw.WriteLine(lines(i))
End Using
Next
如果键列不是第一列,请在x(0)
中更改其索引答案 1 :(得分:0)
用于保存数据而不是数组的合适数据结构将是字典。这样可以很容易地检查您是否已经有特定类别的条目(例如,“apple”或“pear”)。然后你只需要在字典中添加一个新条目或添加到现有条目。
要创建输出文件,您需要遍历字典中的每个条目(以分隔文件),然后遍历字典条目值中的每个实体(以获取文件中的行)。
Option Infer On
Imports System.IO
Imports Microsoft.VisualBasic.FileIO
Module Module1
Sub SeparateCsvToFiles(srcFile As String)
Dim d As New Dictionary(Of String, List(Of String))
Dim headers As String()
Using tfp As New TextFieldParser(srcFile)
tfp.HasFieldsEnclosedInQuotes = True
tfp.SetDelimiters(",")
Dim currentRow As String()
' Get the headers
Try
headers = tfp.ReadFields()
Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
Throw New FormatException(String.Format("Could not read header line in ""{0}"".", srcFile))
End Try
' Read the data
Dim lineNumber As Integer = 1
While Not tfp.EndOfData
Try
currentRow = tfp.ReadFields()
'TODO: Possibly handle the wrong number of entries more gracefully.
If currentRow.Count = headers.Count Then
' assume column to sort on is the zeroth one
Dim category = currentRow(0)
Dim values = String.Join(",", currentRow.Skip(1).Select(Function(s) """" & s & """"))
If d.ContainsKey(category) Then
d(category).Add(values)
Else
Dim valuesList As New List(Of String)
valuesList.Add(values)
d.Add(category, valuesList)
End If
Else
Throw New FormatException(String.Format("Wrong number of entries in line {0} in ""{1}"".", lineNumber, srcFile))
End If
Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
Throw New FormatException(String.Format("Could not read data line {0} in ""{1}"".", lineNumber, srcFile))
End Try
lineNumber += 1
End While
End Using
' Output the data
'TODO: Write code to output files to a different directory.
Dim destDir = Path.GetDirectoryName(srcFile)
Dim fileNumber As Integer = 1
Dim headerLine = String.Join(",", headers.Select(Function(s) """" & s & """"))
'TODO: think up more meaningful names instead of x and y.
For Each x In d
Dim destFile = Path.Combine(destDir, "output" & fileNumber.ToString() & ".csv")
Using sr As New StreamWriter(destFile)
sr.WriteLine(headerLine)
For Each y In x.Value
sr.WriteLine(String.Format("""{0}"",{1}", x.Key, y))
Next
End Using
fileNumber += 1
Next
End Sub
Sub Main()
SeparateCsvToFiles("C:\temp\input.csv")
Console.WriteLine("Done.")
Console.ReadLine()
End Sub
End Module