我正在处理一个“大” excel数据集,并且必须查找一个范围内的唯一值并使用该信息来获取:平均值,标准偏差,中位数,最小值和最大值。
所以下面的代码工作,但我需要“循环”通过41000(x)。所以这会有点沉重,所以在我尝试之前,有什么可以在优化方面做的事情吗?
谢谢!
Sub Finddata()
Dim Startdate As Date
Dim Finalrow As Long
Dim EndDate As Date
Dim Targetperiod As Integer
Dim Company As String
Dim i As Long
Dim d As Integer
Dim x As Long
Dim duplicaterow As Integer
Dim Newduplicaterow As Integer
Dim SourceBook As Workbook
Dim Datasheet As Worksheet, Duplicatesheet As Worksheet
Application.Calculation = xlCalculationManual
Application.ScreenUpdating = False
Application.DisplayStatusBar = False
Application.EnableEvents = False
Set SourceBook = ActiveWorkbook
Set Datasheet = SourceBook.Sheets("Data")
Set Duplicatesheet = SourceBook.Sheets("Duplicate sheet")
Finalrow = Datasheet.Range("A60000").End(xlUp).Row
For x = 2 To 10
Startdate = Datasheet.Range("r" & x)
EndDate = Datasheet.Range("q" & x)
Company = Datasheet.Range("p" & x)
Targetperiod = Datasheet.Range("i" & x)
'Copy data to duplicate sheet
For i = 2 To Finalrow
If (Cells(i, 17) >= Startdate And Cells(i, 17) <= EndDate And Cells(i, 16) = Company And Cells(i, 9) = Targetperiod) Then
Range(Cells(i, 1), Cells(i, 19)).Copy
Duplicatesheet.Range("a10000").End(xlUp).Offset(1, 21).PasteSpecial xlPasteValues
Duplicatesheet.Range("a10000").End(xlUp).Offset(1, 0).PasteSpecial xlPasteValues
End If
Next i
'Definition
duplicaterow = Duplicatesheet.Range("v10000").End(xlUp).Row
'Removes duplicate analyst names
Duplicatesheet.Range("v1", "an" & duplicaterow).RemoveDuplicates Columns:=14, Header:=xlYes
'Definition
Newduplicaterow = Duplicatesheet.Range("v10000").End(xlUp).Row + 1
'Removes two columns that are not needed
Duplicatesheet.Range("Am:An").EntireColumn.Delete
'Below is for finding most recent observation and Target price.
d = 2
Do While Duplicatesheet.Cells(d, 38).Value <> ""
Duplicatesheet.Cells(d, 39).FormulaLocal = "=MAX(IF('Duplicate sheet'!n:n='Duplicate sheet'!Ai" & d & ";'Duplicate sheet'!q:q;))"
d = d + 1
Loop
Dim c As Range
For Each c In Duplicatesheet.Range("Am2", "am" & Newduplicaterow)
c.FormulaArray = c.FormulaR1C1
Next c
d = 2
Do While Duplicatesheet.Cells(d, 38).Value <> ""
Duplicatesheet.Cells(d, 39).Value = Duplicatesheet.Cells(d, 39).Value
Duplicatesheet.Cells(d, 40).Value = Duplicatesheet.Cells(d, 35) & ", " & Duplicatesheet.Cells(d, 39)
d = d + 1
Loop
d = 2
Do While Duplicatesheet.Cells(d, 38).Value <> ""
Duplicatesheet.Cells(d, 41).FormulaLocal = "=index('Duplicate sheet'!d:d;match('Duplicate sheet'!AN" & d & ";'Duplicate sheet'!s:s;0);0)"
Duplicatesheet.Cells(d, 41).Value = Duplicatesheet.Cells(d, 41).Value
d = d + 1
Loop
'This section creates the values that are needed in the data sheet, for consensus
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 0).FormulaLocal = "=Average(AO2:AO" & Newduplicaterow - 1 & ")"
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 1).FormulaLocal = "=iferror(STDEV.S(AO2:AO" & Newduplicaterow - 1 & ");count(AO2:AO" & Newduplicaterow - 1 & "))"
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 2).FormulaLocal = "=MEDIAN(AO2:AO" & Newduplicaterow - 1 & ")"
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 3).FormulaLocal = "=Min(AO2:AO" & Newduplicaterow - 1 & ")"
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 4).FormulaLocal = "=max(AO2:AO" & Newduplicaterow - 1 & ")"
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 0).Value = Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 0).Value
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 1).Value = Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 1).Value
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 2).Value = Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 2).Value
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 3).Value = Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 3).Value
Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 4).Value = Duplicatesheet.Range("v" & Newduplicaterow + 1).Offset(0, 4).Value
Duplicatesheet.Range("v" & Newduplicaterow + 1, "z" & Newduplicaterow + 1).Copy
Datasheet.Range("t" & x).PasteSpecial xlPasteValues
Duplicatesheet.Range("A2:BB6000").ClearContents
Next x
Application.Calculation = xlCalculationAutomatic
Application.ScreenUpdating = True
Application.DisplayStatusBar = True
Application.EnableEvents = True
End Sub
答案 0 :(得分:0)
很难看到数据和工作簿的结构和布局(以防万一,您可以更简单/更高效地完成工作)。但是下面是一些基本观察结果。
您可能应该在下面使用Range.AutoFilter
(而不是一次循环浏览成千上万的行)。然后一口气将Range.SpecialCells(xlCellTypeVisible)
复制粘贴到您的duplicateSheet
中。指定日期过滤条件时,可能需要将日期转换为双精度(使用CDbl()
函数)。 (您也可以一次读入一个数组。使用数组可以提高性能,但是您还必须编写更多代码。)
'Copy data to duplicate sheet
For i = 2 To finalRow
If (Cells(i, 17) >= startDate And Cells(i, 17) <= EndDate And Cells(i, 16) = Company And Cells(i, 9) = Targetperiod) Then
Range(Cells(i, 1), Cells(i, 19)).Copy
duplicateSheet.Range("a10000").End(xlUp).Offset(1, 21).PasteSpecial xlPasteValues
duplicateSheet.Range("a10000").End(xlUp).Offset(1, 0).PasteSpecial xlPasteValues
End If
Next i
下面,您将应作为数组公式的内容作为非数组公式分配给列AM
(每次一行)中的每个单元格;然后再次执行循环(一次一行)以将非数组公式转换为数组公式;然后再次执行循环(一次一行)以将公式转换为静态值。
是否有一个原因,您无法一次设置整个范围的数组公式?我相信Excel将为您处理相对单元格引用。然后将所述范围的公式转换为静态值。 (或者,您可以将范围的值读取到数组中,并在内存中计算条件MAX
。但是如前所述,您需要编写更多代码。)
此外,某些较新版本的Excel本身具有MAXIFS
函数。如果您有权访问它,请尝试使用它。另一个观察结果是您的数组公式引用了整个列。可能值得限制范围(即到该列中最后使用的行),因此,您只查看<100k单元(例如),而不是一百万。
最后一点是字符串的连接可能会很昂贵。通常,您希望在连接时使用某种形式的JOIN
函数,尽管由于仅连接两个值(每个循环迭代),因此不确定在这里对性能有多少好处。
'Below is for finding most recent observation and Target price.
d = 2
Do While duplicateSheet.Cells(d, 38).Value <> ""
duplicateSheet.Cells(d, 39).FormulaLocal = "=MAX(IF('Duplicate sheet'!n:n='Duplicate sheet'!Ai" & d & ";'Duplicate sheet'!q:q;))"
d = d + 1
Loop
Dim c As Range
For Each c In duplicateSheet.Range("Am2", "am" & Newduplicaterow)
c.FormulaArray = c.FormulaR1C1
Next c
d = 2
Do While duplicateSheet.Cells(d, 38).Value <> ""
duplicateSheet.Cells(d, 39).Value = duplicateSheet.Cells(d, 39).Value
duplicateSheet.Cells(d, 40).Value = duplicateSheet.Cells(d, 35) & ", " & duplicateSheet.Cells(d, 39)
d = d + 1
Loop
您可以一次性将以下公式分配给整个范围。需要注意的一件事是,提供0
作为MATCH
的第三个参数意味着您正在执行线性搜索(针对每个循环迭代)。考虑使用字典或集合进行更快的查找(字典可能更方便,因为它具有Exists
方法)。对于您的情况,我认为S
列中的值将是键,而D
列中的值将是键的对应值。
此外,您可以一次将整个范围转换为静态值(而不是一次循环一行)。
d = 2
Do While duplicateSheet.Cells(d, 38).Value <> ""
duplicateSheet.Cells(d, 41).FormulaLocal = "=index('Duplicate sheet'!d:d;match('Duplicate sheet'!AN" & d & ";'Duplicate sheet'!s:s;0);0)"
duplicateSheet.Cells(d, 41).Value = duplicateSheet.Cells(d, 41).Value
d = d + 1
Loop
希望这会给您一些从哪里开始的想法。