如何优化/加速此代码,以便我可以处理大数据集?

时间:2019-04-14 11:34:16

标签: vba

我正在处理一个“大” excel数据集,并且必须查找一个范围内的唯一值并使用该信息来获取:平均值,标准偏差,中位数,最小值和最大值。

所以下面的代码工作,但我需要“循环”通过41000(x)。所以这会有点沉重,所以在我尝试之前,有什么可以在优化方面做的事情吗?

谢谢!

Sub Finddata()
Dim Startdate As Date
Dim Finalrow As Long
Dim EndDate As Date
Dim Targetperiod As Integer
Dim Company As String
Dim i As Long
Dim d As Integer
Dim x As Long
Dim duplicaterow As Integer
Dim Newduplicaterow As Integer
Dim SourceBook As Workbook
Dim Datasheet As Worksheet, Duplicatesheet As Worksheet

Application.Calculation = xlCalculationManual
Application.ScreenUpdating = False
Application.DisplayStatusBar = False
Application.EnableEvents = False

Set SourceBook = ActiveWorkbook
Set Datasheet = SourceBook.Sheets("Data")
Set Duplicatesheet = SourceBook.Sheets("Duplicate sheet")

Finalrow = Datasheet.Range("A60000").End(xlUp).Row


For x = 2 To 10

Startdate = Datasheet.Range("r" & x)
EndDate = Datasheet.Range("q" & x)
Company = Datasheet.Range("p" & x)
Targetperiod = Datasheet.Range("i" & x)


'Copy data to duplicate sheet
For i = 2 To Finalrow
    If (Cells(i, 17) >= Startdate And Cells(i, 17) <= EndDate And Cells(i, 16) = Company And Cells(i, 9) = Targetperiod) Then
    Range(Cells(i, 1), Cells(i, 19)).Copy
    Duplicatesheet.Range("a10000").End(xlUp).Offset(1, 21).PasteSpecial xlPasteValues
    Duplicatesheet.Range("a10000").End(xlUp).Offset(1, 0).PasteSpecial xlPasteValues
    End If
Next i

'Definition
duplicaterow = Duplicatesheet.Range("v10000").End(xlUp).Row

'Removes duplicate analyst names
Duplicatesheet.Range("v1", "an" & duplicaterow).RemoveDuplicates Columns:=14, Header:=xlYes

'Definition
Newduplicaterow = Duplicatesheet.Range("v10000").End(xlUp).Row + 1

'Removes two columns that are not needed
Duplicatesheet.Range("Am:An").EntireColumn.Delete

'Below is for finding most recent observation and Target price.
d = 2

Do While Duplicatesheet.Cells(d, 38).Value <> ""
    Duplicatesheet.Cells(d, 39).FormulaLocal = "=MAX(IF('Duplicate sheet'!n:n='Duplicate sheet'!Ai" & d & ";'Duplicate sheet'!q:q;))"
    d = d + 1
Loop

  Dim c As Range
  For Each c In Duplicatesheet.Range("Am2", "am" & Newduplicaterow)
    c.FormulaArray = c.FormulaR1C1
  Next c

d = 2

Do While Duplicatesheet.Cells(d, 38).Value <> ""
    Duplicatesheet.Cells(d, 39).Value = Duplicatesheet.Cells(d, 39).Value
    Duplicatesheet.Cells(d, 40).Value = Duplicatesheet.Cells(d, 35) & ", " & Duplicatesheet.Cells(d, 39)
    d = d + 1
Loop

d = 2

Do While Duplicatesheet.Cells(d, 38).Value <> ""
    Duplicatesheet.Cells(d, 41).FormulaLocal = "=index('Duplicate sheet'!d:d;match('Duplicate sheet'!AN" & d & ";'Duplicate sheet'!s:s;0);0)"
    Duplicatesheet.Cells(d, 41).Value = Duplicatesheet.Cells(d, 41).Value
    d = d + 1
Loop


'This section creates the values that are needed in the data sheet, for consensus
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 0).FormulaLocal = "=Average(AO2:AO" & Newduplicaterow - 1 & ")"
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 1).FormulaLocal = "=iferror(STDEV.S(AO2:AO" & Newduplicaterow - 1 & ");count(AO2:AO" & Newduplicaterow - 1 & "))"
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 2).FormulaLocal = "=MEDIAN(AO2:AO" & Newduplicaterow - 1 & ")"
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 3).FormulaLocal = "=Min(AO2:AO" & Newduplicaterow - 1 & ")"
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 4).FormulaLocal = "=max(AO2:AO" & Newduplicaterow - 1 & ")"
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 0).Value = Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 0).Value
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 1).Value = Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 1).Value
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 2).Value = Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 2).Value
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 3).Value = Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 3).Value
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 4).Value = Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 4).Value

Duplicatesheet.Range("v" & Newduplicaterow + 1, "z" & Newduplicaterow + 1).Copy
Datasheet.Range("t" & x).PasteSpecial xlPasteValues

Duplicatesheet.Range("A2:BB6000").ClearContents

Next x


Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
Application.DisplayStatusBar = True
Application.EnableEvents = True

End Sub

1 个答案:

答案 0 :(得分:0)

很难看到数据和工作簿的结构和布局(以防万一,您可以更简单/更高效地完成工作)。但是下面是一些基本观察结果。


您可能应该在下面使用Range.AutoFilter(而不是一次循环浏览成千上万的行)。然后一口气将Range.SpecialCells(xlCellTypeVisible)复制粘贴到您的duplicateSheet中。指定日期过滤条件时,可能需要将日期转换为双精度(使用CDbl()函数)。 (您也可以一次读入一个数组。使用数组可以提高性能,但是您还必须编写更多代码。)

    'Copy data to duplicate sheet
    For i = 2 To finalRow
        If (Cells(i, 17) >= startDate And Cells(i, 17) <= EndDate And Cells(i, 16) = Company And Cells(i, 9) = Targetperiod) Then
            Range(Cells(i, 1), Cells(i, 19)).Copy
            duplicateSheet.Range("a10000").End(xlUp).Offset(1, 21).PasteSpecial xlPasteValues
            duplicateSheet.Range("a10000").End(xlUp).Offset(1, 0).PasteSpecial xlPasteValues
        End If
    Next i

下面,您将应作为数组公式的内容作为非数组公式分配给列AM(每次一行)中的每个单元格;然后再次执行循环(一次一行)以将非数组公式转换为数组公式;然后再次执行循环(一次一行)以将公式转换为静态值。

是否有一个原因,您无法一次设置整个范围的数组公式?我相信Excel将为您处理相对单元格引用。然后将所述范围的公式转换为静态值。 (或者,您可以将范围的值读取到数组中,并在内存中计算条件MAX。但是如前所述,您需要编写更多代码。)

此外,某些较新版本的Excel本身具有MAXIFS函数。如果您有权访问它,请尝试使用它。另一个观察结果是您的数组公式引用了整个列。可能值得限制范围(即到该列中最后使用的行),因此,您只查看<100k单元(例如),而不是一百万。

最后一点是字符串的连接可能会很昂贵。通常,您希望在连接时使用某种形式的JOIN函数,尽管由于仅连接两个值(每个循环迭代),因此不确定在这里对性能有多少好处。

    'Below is for finding most recent observation and Target price.
    d = 2

    Do While duplicateSheet.Cells(d, 38).Value <> ""
        duplicateSheet.Cells(d, 39).FormulaLocal = "=MAX(IF('Duplicate sheet'!n:n='Duplicate sheet'!Ai" & d & ";'Duplicate sheet'!q:q;))"
        d = d + 1
    Loop

    Dim c As Range
    For Each c In duplicateSheet.Range("Am2", "am" & Newduplicaterow)
      c.FormulaArray = c.FormulaR1C1
    Next c

    d = 2

    Do While duplicateSheet.Cells(d, 38).Value <> ""
        duplicateSheet.Cells(d, 39).Value = duplicateSheet.Cells(d, 39).Value
        duplicateSheet.Cells(d, 40).Value = duplicateSheet.Cells(d, 35) & ", " & duplicateSheet.Cells(d, 39)
        d = d + 1
    Loop

您可以一次性将以下公式分配给整个范围。需要注意的一件事是,提供0作为MATCH的第三个参数意味着您正在执行线性搜索(针对每个循环迭代)。考虑使用字典或集合进行更快的查找(字典可能更方便,因为它具有Exists方法)。对于您的情况,我认为S列中的值将是键,而D列中的值将是键的对应值。

此外,您可以一次将整个范围转换为静态值(而不是一次循环一行)。

    d = 2

    Do While duplicateSheet.Cells(d, 38).Value <> ""
        duplicateSheet.Cells(d, 41).FormulaLocal = "=index('Duplicate sheet'!d:d;match('Duplicate sheet'!AN" & d & ";'Duplicate sheet'!s:s;0);0)"
        duplicateSheet.Cells(d, 41).Value = duplicateSheet.Cells(d, 41).Value
        d = d + 1
    Loop

希望这会给您一些从哪里开始的想法。

相关问题