如何在VB.net中改进创建DataTable的性能?

时间:2012-02-20 22:34:48

标签: vb.net

我有以下代码,第一次加载速度很慢。 CSV文件大约是4mb 16000行。

        If Session("tb") Is Nothing Then
            Dim str As String()
            If (IsNothing(Cache("csvdata"))) Then
                str = File.ReadAllLines(Server.MapPath("~/test/feed.csv"))
                Cache.Insert("csvdata", str, Nothing, DateTime.Now.AddHours(12), TimeSpan.Zero)
            Else
                str = CType(Cache("csvdata"), Array)
            End If
            Dim dt As New DataTable
            dt.Columns.Add("Shape", GetType(System.String))
            dt.Columns.Add("Weight", GetType(System.Double))
            dt.Columns.Add("Color", GetType(System.String))
            dt.Columns.Add("Clarity", GetType(System.String))
            dt.Columns.Add("Price", GetType(System.Int32))
            dt.Columns.Add("CutGrade", GetType(System.String))

            For i As Integer = 1 To str.Length - 1
                Dim pattern As String = ",(?=([^""]*""[^""]*"")*[^""]*$)"
                Dim rgx As New Regex(pattern)
                Dim t As String = rgx.Replace(str(i), "\")
                Dim s As String() = t.Split("\"c)
                Dim pr As Int32 = CType(s(5), Int32)
                Dim fpr As Int32
                Dim rate As Double
                Select Case pr
                    Case Is < 300
                        rate = 2
                    Case 301 To 600
                        rate = 1.7
                    Case Is > 600
                        rate = 1.16
                End Select
                fpr = Math.Round(pr * rate)
                Dim a As String() = {s(1), s(2), s(3), s(4), fpr, s(40)}
                dt.Rows.Add(a)
            Next

            Session("tb") = dt
            ListView1.DataSource = dt
            ListView1.DataBind()
        Else
            Dim x As DataTable = CType(Session("tb"), DataTable)
            ListView1.DataSource = x
            ListView1.DataBind()
        End If

csv文件被缓存,我认为这可以与所有人共享。 (一个人在12小时内加载一次) 创建Session后,页面加载速度也很快。 因此,创建Datatable似乎是一个缓慢的过程。 这是第一次处理数据表,我确信有人可以指出我做错了什么。

谢谢

更新

我已将Cache更改为原始Datatable而不是CSV文件。 它现在加载速度很快,但我想知道这是不是一个坏主意。

 Cache.Insert("csvdata", dt, Nothing, DateTime.Now.AddHours(12), TimeSpan.Zero)

一旦它存储在Cache中,我就可以使用Linq对它运行Query。

SAMPLE CSV 前3行

Supplier ID,Shape,Weight,Color,Clarity,Price / Carat,Lot Number,Stock Number,Lab,Cert #,Certificate Image,2nd Image,Dimension,Depth %,Table %,Crown Angle,Crown %,Pavilion Angle,Pavilion %,Girdle Thinnest,Girdle Thickest,Girdle %,Culet Size,Culet Condition,Polish,Symmetry,Fluor Color,Fluor Intensity,Enhancements,Remarks,Availability,Is Active,FC-Main Body,FC- Intensity,FC- Overtone,Matched Pair,Separable,Matching Stock #,Pavilion,Syndication,Cut Grade,External Url
9349,Round,1.74,F,VVS1,13650.00,,IM-95-188-243,ABC,11228,,,7.81|7.85|4.62,59.00,62.00,34.00,13.00,,,Medium,,0,None,,Excellent,Very Good,Blue,Medium,,"",Not Specified,Y,,,,False,True,,,,Very Good,http://www.test/teste.
9949,Round,1.00,I,VVS1,6059.00,,IM-95-189-C021,ABC,212197,,,6.37|6.42|3.96,61.90,54.00,34.50,16.00,,,Thin,Slightly Thick,0,None,,Excellent,Good,,None,,"Additional pinpoints are not shown.",Guaranteed Available,Y,,,,False,True,,,,Very Good,http://www.test/test.

2 个答案:

答案 0 :(得分:0)

使用TextFieldParser来阅读CSV,而不是自己拆分字符串。

此外,如果您使用List(Of CustomClass),其中CustomClass具有Shape,Weight,Color等属性,您可以避免DataTable的不必要开销,并且您仍然可以对List执行LINQ查询。

原谅我的C#,我没有在这个盒子上安装VB.NET。

    public class Gemstone
    {
        public string Shape { get; set; }
        public double Weight { get; set; }
        public string Color { get; set; }
    }

    static void Main(string[] args)
    {
        TextFieldParser textFieldParser = new TextFieldParser("data.txt");
        textFieldParser.Delimiters = new string[] {","};
        textFieldParser.ReadLine(); // skip header line
        List<Gemstone> list = new List<Gemstone>(16000);  // allocate the list with your best calculated guess of its final size
        while(!textFieldParser.EndOfData)
        {
            string[] fields = textFieldParser.ReadFields();
            Gemstone gemstone = new Gemstone();
            gemstone.Shape = fields[1];
            gemstone.Weight = Double.Parse(fields[2]);
            gemstone.Color = fields[3];
            list.Add(gemstone);
        }

答案 1 :(得分:0)

仅供参考我刚刚找到了整个TextFieldParser的东西,我做了大量的文本文件解析,所以我测试了它....

在一个11mb的文件中,有大约5200行和300列。

这是我在放入数据表时使用的速度的25%。当我删除数据表代码时,它大约是速度的15%:

        Dim DataTable As New DataTable()
    Dim StartTime As Long = Now.Ticks
    Dim Reader As New FileIO.TextFieldParser("file.txt")
    Reader.TextFieldType = FileIO.FieldType.Delimited
    Reader.SetDelimiters(vbTab)
    Reader.HasFieldsEnclosedInQuotes = False
    Dim Header As Boolean = True
    While Not Reader.EndOfData
        Dim Fields() As String = Reader.ReadFields
        If Header Then
            For I As Integer = 1 To 320
                DataTable.Columns.Add("Col" & I)
            Next
            Header = False
        Else
            If Mid(Fields(0), 1, 1) <> "#" Then DataTable.Rows.Add(Fields)
        End If
    End While
    Debug.Print((Now.Ticks - StartTime) / 10000 & "ms")

    Dim DataTable2 As New DataTable()
    StartTime = Now.Ticks
    For I As Integer = 1 To 320
        DataTable2.Columns.Add("Col" & I)
    Next
    For Each line As String In System.IO.File.ReadAllLines("file.txt")
        Dim NVP() As String = Split(line, vbTab)
        If Mid(line, 1, 1) <> "#" Then DataTable2.Rows.Add(NVP)
    Next
    Debug.Print((Now.Ticks - StartTime) / 10000 & "ms")

删除了可数据代码:

        Dim StartTime As Long = Now.Ticks
    Dim Reader As New FileIO.TextFieldParser("file.txt")
    Reader.TextFieldType = FileIO.FieldType.Delimited
    Reader.SetDelimiters(vbTab)
    Reader.HasFieldsEnclosedInQuotes = False
    Dim Header As Boolean = True
    While Not Reader.EndOfData
        Dim Fields() As String = Reader.ReadFields
    End While
    Debug.Print((Now.Ticks - StartTime) / 10000 & "ms")

    StartTime = Now.Ticks
    For Each line As String In System.IO.File.ReadAllLines("file.txt")
        Dim NVP() As String = Split(line, vbTab)
    Next
    Debug.Print((Now.Ticks - StartTime) / 10000 & "ms")

有点让我感到惊讶,但我想数据表有更多的功能。我发现另一件我永远不会使用的新东西:(