使用ASP.NET将逗号分隔CSV文件转换为多个文件

时间:2014-05-03 19:45:57

标签: asp.net csv

任何人都可以帮我指出正确的方向吗?提前谢谢。

我希望制作一个小型应用程序来处理csv文件,方法是根据数组中其中一列(例如Header1)的不同值列表将行输出到多个csv文件中,但是,我不知道从哪里开始。仅供参考:标题中的列表将始终更改。

我已经能够使用以下代码将文件读入数组:

[Read From Comma-Delimited Text Files in Visual Basic][1]

现在我想根据第一列处理数据。例如;

INPUT:

input.csv

"Header1","Header2","Header3","Header4"
"apple","pie","soda","beer"
"apple","cake","milk","wine"
"pear","pie","soda","beer"
"pear","pie","soda","beer"
"orange","pie","soda","beer"
"orange","pie","soda","beer"

输出:

output1.csv

"Header1","Header2","Header3","Header4"
"apple","pie","soda","beer"
"apple","cake","milk","wine"

output2.csv

"Header1","Header2","Header3","Header4"
"pear","pie","soda","beer"
"pear","pie","soda","beer"

output2.csv

"Header1","Header2","Header3","Header4"
"orange","pie","soda","beer"
"orange","pie","soda","beer"

2 个答案:

答案 0 :(得分:0)

你能做的是

  • 将键列读入列表q
  • 创建不同的密钥列表dist
  • 将q中的值与dist进行比较,并根据dist写行中的索引与文件进行比较

实施例

Dim lines As String() = System.IO.File.ReadAllLines("input.csv")
Dim q = (From line In lines
                        Let x = line.Split(",")
                        Select x(0)).ToList()
Dim dist = q.Distinct().ToList()

For j As Integer = 1 To dist.Count - 1
    Using sw As New StreamWriter(File.Open("output" & j & ".csv", FileMode.OpenOrCreate))
        sw.WriteLine(lines(0))
    End Using
Next

For i As Integer = 1 To q.Count - 1
    Console.WriteLine(q(i))
    Console.WriteLine(dist.IndexOf(q(i)))

    Using sw As New StreamWriter(File.Open("output" & dist.IndexOf(q(i)) & ".csv", FileMode.Append))
        sw.WriteLine(lines(i))
    End Using
Next

如果键列不是第一列,请在x(0)

中更改其索引

答案 1 :(得分:0)

用于保存数据而不是数组的合适数据结构将是字典。这样可以很容易地检查您是否已经有特定类别的条目(例如,“apple”或“pear”)。然后你只需要在字典中添加一个新条目或添加到现有条目。

要创建输出文件,您需要遍历字典中的每个条目(以分隔文件),然后遍历字典条目值中的每个实体(以获取文件中的行)。

Option Infer On

Imports System.IO
Imports Microsoft.VisualBasic.FileIO

Module Module1

    Sub SeparateCsvToFiles(srcFile As String)

        Dim d As New Dictionary(Of String, List(Of String))
        Dim headers As String()

        Using tfp As New TextFieldParser(srcFile)
            tfp.HasFieldsEnclosedInQuotes = True
            tfp.SetDelimiters(",")
            Dim currentRow As String()

            ' Get the headers
            Try
                headers = tfp.ReadFields()
            Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
                Throw New FormatException(String.Format("Could not read header line in ""{0}"".", srcFile))
            End Try

            ' Read the data
            Dim lineNumber As Integer = 1

            While Not tfp.EndOfData
                Try
                    currentRow = tfp.ReadFields()

                    'TODO: Possibly handle the wrong number of entries more gracefully.
                    If currentRow.Count = headers.Count Then
                        ' assume column to sort on is the zeroth one
                        Dim category = currentRow(0)
                        Dim values = String.Join(",", currentRow.Skip(1).Select(Function(s) """" & s & """"))

                        If d.ContainsKey(category) Then
                            d(category).Add(values)
                        Else
                            Dim valuesList As New List(Of String)
                            valuesList.Add(values)
                            d.Add(category, valuesList)
                        End If

                    Else
                        Throw New FormatException(String.Format("Wrong number of entries in line {0} in ""{1}"".", lineNumber, srcFile))
                    End If

                Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
                    Throw New FormatException(String.Format("Could not read data line {0} in ""{1}"".", lineNumber, srcFile))
                End Try

                lineNumber += 1

            End While
        End Using

        ' Output the data
        'TODO: Write code to output files to a different directory.
        Dim destDir = Path.GetDirectoryName(srcFile)

        Dim fileNumber As Integer = 1
        Dim headerLine = String.Join(",", headers.Select(Function(s) """" & s & """"))

        'TODO: think up more meaningful names instead of x and y.    
        For Each x In d
            Dim destFile = Path.Combine(destDir, "output" & fileNumber.ToString() & ".csv")

            Using sr As New StreamWriter(destFile)
                sr.WriteLine(headerLine)
                For Each y In x.Value
                    sr.WriteLine(String.Format("""{0}"",{1}", x.Key, y))
                Next
            End Using

            fileNumber += 1

        Next

    End Sub

    Sub Main()
        SeparateCsvToFiles("C:\temp\input.csv")
        Console.WriteLine("Done.")
        Console.ReadLine()

    End Sub

End Module