比较两个列并删除间隙使用宏

时间:2016-04-30 20:53:04

标签: excel-vba macros excel-2013 vba excel

我有一个Excel文件,我想比较A列和D列中的日期并删除它们之间的差距。 例如基于这张图片 enter image description here A列的时间从14:56:23开始,D在14:56:18。所以我想要删除D列中的所有数据直到14:56:23,这样它在A和D中都是相同的。 这个问题会重复多次,所以我想开发一个宏来做它。

这是一个小程序,只比较A列和D列中的前两个单元格

Sub Edit_Date_time()

Dim r As Range
Dim l As Range

Set r = Range("A2")
Set l = Range("D2")

    If r.Value <> l.Value Then
        Range("D2:E2").Select
        Selection.Delete Shift:=xlUp
        End If


   End Sub

问题是细胞包含日期和时间。时间,所以我无法将其作为值进行比较。我还要扩展此代码以涵盖整个A2&amp; D2列不仅是前两个单元格。

2 个答案:

答案 0 :(得分:1)

你的问题已经变形了很多次,但是我想问的是“如何在不匹配时间戳的情况下消除行”......

比较时间戳可能很棘手,即使它们格式正确也是如此。您希望2/17/2016 14:56:29等于2/17/2016 14:56:29,但是在字符串或一般格式中您看不到的毫秒差异。因此,在确定&lt ;,&gt;或=。

时应使用容差

请记住,时间戳为1.0 = 1天。所以1/10秒(1/24/60/60/10)。

我假设您希望在这些时间戳早于相应的温度时间戳时删除压力行,并且您希望在这些时间戳早于相应的压力时间戳时删除温度行。

这意味着,在最坏的情况下,循环需要遍历整个数据集两次。

我根据这些数据测试了代码...

enter image description here

处理后,黄色单元格应对齐,橙色行应删除。以下是我得到的结果......

enter image description here

使用此代码...

Sub ParseDateTime()
Dim TRange As Range, PRange As Range
Dim iLoop As Long, LoopEnd As Long
Dim theRow As Long, LastRow As Long

' set the range for the temperature data
LastRow = Sheets("Sheet1").Range("A" & Sheets("Sheet1").Rows.Count).End(xlUp).Row
Set TRange = Sheets("Sheet1").Range("A1:C" & LastRow)
LoopEnd = LastRow

' set the range for the pressure data
LastRow = Sheets("Sheet1").Range("D" & Sheets("Sheet1").Rows.Count).End(xlUp).Row
Set PRange = Sheets("Sheet1").Range("D1:E" & LastRow)
If LastRow > LoopEnd Then LoopEnd = LastRow

' loop through the range
theRow = 1
For iLoop = 2 To 2 * LoopEnd
    theRow = theRow + 1
' stop searching when no more data
    If TRange(theRow, 1) = "" And PRange(theRow, 1) = "" Then Exit For
' if out of temperature data, eliminate the rest of the pressure data
    If TRange(theRow, 1) = "" Then
        PRange.Rows(theRow).Delete Shift:=xlUp
        theRow = theRow - 1
    End If
' if out of pressure data, eliminate the rest of the temperature data
    If PRange(theRow, 1) = "" Then
        TRange.Rows(theRow).Delete Shift:=xlUp
        theRow = theRow - 1
    End If
' eliminate pressure rows where the time stamp is earlier than the temperature timestamp
    If (TRange(theRow, 1).Value > PRange(theRow, 1).Value) And _
       (Abs(TRange(theRow, 1).Value - PRange(theRow, 1).Value) >= (1# / 24# / 60# / 60# / 10#)) Then
        PRange.Rows(theRow).Delete Shift:=xlUp
        theRow = theRow - 1
    Else
' eliminate temperature rows where the time stamp is earlier than the pressure timestamp
        If (TRange(theRow, 1).Value < PRange(theRow, 1).Value) And _
           (Abs(TRange(theRow, 1).Value - PRange(theRow, 1).Value) >= (1# / 24# / 60# / 60# / 10#)) Then
            TRange.Rows(theRow).Delete Shift:=xlUp
            theRow = theRow - 1
        End If
    End If
Next iLoop

End Sub

答案 1 :(得分:0)

这个问题的重新定义使得难以处理;特别是因为一些(现在已删除)标准会使当前问题的某些解决方案变得不切实际。

我记得你的数据来自多个CSV文件;一些含有温度,一些含有压力。事实上,有太多的数据可以让人想到“溢出”到另一张工作表。仅此事实使得单个工作表值比较变得不切实际。即使它完全适用于单个工作表,将一百万个日期时间与第二个百万个日期时间进行比较,并删除不适合这两个类别的条目将是一项艰巨而耗时的任务。

艰苦而耗时的任务最好在“内存中”处理。反复返回工作表来比较值会导致处理陷入困境,除非绝对必要,否则应该避免使用。

这似乎应该是一个SQL问题,其中两组不同的CSV被加载到两个临时但统一的数据库表中,并在各自的日期时间内编入索引。然后可以执行INNER JOIN来构建匹配记录的第三个表。易于peasy。

但这是一个 excel / vba 问题,应该以实物回答。

VBA Scripting.Dictionary对象就像一个内存数据库表,并带有一个名为key的唯一主要“索引”。它还具有变体类型的单个附加“字段”,其可以接收变体可以具有的任何样式的值或值。使用日期时间作为键来加载两个带有相应值的字典(一个用于温度,另一个用于压力)似乎是将两者结合起来的最有效方法。

<强> Sample data

我从几个类似于以下内容的CSV开始。

Temperature_Pressure_CSVs_sample3 Temperature_Pressure_CSVs_sample4
Temperaturen-25.csv SPS-25.csv

三个温度CSV和三个压力CSV总计约300K记录(每个约150K),每个记录的故意缺失日期时间。

<强> Module2 (Code)

Option Explicit

'public constant dictating the maximum number of entries per worksheet (never set higher than Rows.Count-3)
Public Const iMAXROWS As Long = 50000

Sub main()
    Dim fp As String, fn As String, tmp As Variant
    Dim dt As Variant, tdic As Object, pdic As Object
    Dim tpwb As Workbook, a As Long, d As Long, w As Long

    'apptggl btggl:=false   'uncomment this when you have finished debugging

    'create 2 dictionary objects to receive ALL of the data
    Set tdic = CreateObject("Scripting.Dictionary")
    Set pdic = CreateObject("Scripting.Dictionary")
    tdic.CompareMode = vbBinaryCompare
    pdic.CompareMode = vbBinaryCompare

    'load the dictionaries using the overwrite method
    fp = Environ("TMP") & Chr(92) & "TempPress"
    fn = Dir(fp & Chr(92) & "*.csv", vbNormal)
    Do While CBool(Len(fn))
        Select Case True
            Case LCase(fn) Like "*temperaturen*"
                'debug.Print "found " & fn
                loadTPDictionary CStr(fp & Chr(92) & fn), tdic, 3
            Case LCase(fn) Like "*sps*"
                'debug.Print "found " & fn
                loadTPDictionary CStr(fp & Chr(92) & fn), pdic, 2
            Case Else
                'do nothing; not temperature or pressure
        End Select
        'debug.Print tdic.Count & ":" & pdic.Count
        fn = Dir
    Loop

    'debug.Print tdic.Count
    'debug.Print pdic.Count

    'At this point you have two dictionary object; one for temps and one for pressures
    'They have a unique indexed key on their datetime values
    'Time to merge the two

    'First load all matching pressures into the temperatures
    For Each dt In tdic
        If pdic.Exists(dt) Then
            tdic.Item(dt) = Array(tdic.Item(dt)(0), tdic.Item(dt)(1), tdic.Item(dt)(2), _
                                  pdic.Item(dt)(1), pdic.Item(dt)(0))
        End If
    Next dt

    'Second, get rid of temps that had no matching pressure
    For Each dt In tdic
        If UBound(tdic.Item(dt)) < 4 Then
            tdic.Remove dt
        End If
    Next dt

    'debug.Print tdic.Count
    'debug.Print pdic.Count

    'At this point the temp dictionary object contains a merged set of matching temps and pressures
    'Time to put the values into one or more worksheets

    'create a new target workbook and set up the first target worksheet
    Set tpwb = Workbooks.Add
    With tpwb
        For w = 1 To Int(tdic.Count / iMAXROWS) + 1
            a = 1: d = 1
            'first load an array with the dictionary's values
            ReDim tmp(1 To iMAXROWS, 1 To 5)
            For Each dt In tdic
                If d > (w * iMAXROWS) Then
                    Exit For
                ElseIf d > ((w - 1) * iMAXROWS) Then
                    tmp(a, 1) = tdic.Item(dt)(0)
                    tmp(a, 2) = tdic.Item(dt)(1)
                    tmp(a, 3) = tdic.Item(dt)(2)
                    tmp(a, 4) = tdic.Item(dt)(3)
                    tmp(a, 5) = tdic.Item(dt)(4)
                    a = a + 1
                End If
                d = d + 1
            Next dt

            On Error GoTo bm_Need_Worksheet
            With .Worksheets(w + 1) '<~~ ignore the original blank worksheet from the new workbook
                'dump the values back into the worksheet
                .Cells(2, 1).Resize(UBound(tmp, 1), UBound(tmp, 2)) = tmp
                'format the datetimes
                .Range("A2:A" & UBound(tmp, 1) + 1 & ",E2:E" & UBound(tmp, 1) + 1).NumberFormat = _
                    "[Color10]mm/dd/yyyy hh:mm:ss"
                .Columns("A:E").AutoFit
            End With
            'clear the variant array
            Erase tmp
        Next w
        'get rid of the original unprepped worksheet created with the new workbook
        .Worksheets(1).Delete
        'save as a binary workbook due to size considerations
        .SaveAs Filename:=fp & Chr(92) & Format(Date, "\T\P\_yyyymmdd\_") & CLng(Timer), _
                       FileFormat:=xlExcel12, AddToMru:=True
        'close savechanges:=false   'uncomment this after debugging
    End With

    'we got safely here; skip over worksheet creation
    GoTo bm_Safe_Exit

bm_Need_Worksheet:
    On Error GoTo 0
    With tpwb.Worksheets.Add(After:=Sheets(Sheets.Count))
        On Error GoTo bm_Need_Worksheet
        .Range("A1:E1") = Array("Date and Time", "Temperature 1", "Temperature 2", _
                                "Pressure", "Date and Time (p)")
        .Name = "Temperaturen & Pressure " & w
        With .Parent.Windows(1)
            .SplitColumn = 0: .SplitRow = 1
            .FreezePanes = True
            .Zoom = 75
        End With
    End With
    Resume

bm_Safe_Exit:
    'discard the dictionary objects
    tdic.RemoveAll: Set tdic = Nothing
    pdic.RemoveAll: Set pdic = Nothing
    'restore the application environment
    appTGGL

End Sub

Sub loadTPDictionary(fpn As String, ByRef dict As Object, flds As Long)
    Dim f As Long, v As Long, vVALs As Variant, wb As Workbook
    Workbooks.OpenText Filename:=fpn, StartRow:=1, DataType:=xlDelimited, _
                            ConsecutiveDelimiter:=False, _
                            Comma:=True, Tab:=False, Semicolon:=False, Space:=False, Other:=False, _
                            FieldInfo:=IIf(flds = 3, Array(Array(1, 3), Array(2, 1), Array(3, 1)), _
                                                     Array(Array(1, 3), Array(2, 1)))
    With ActiveWorkbook
        With Worksheets(1)
            'Debug.Print .Cells(1, 1).Value
            vVALs = .Range(.Cells(2, 1), .Cells(Rows.Count, flds).End(xlUp)).Value2
        End With
        .Close SaveChanges:=False
    End With

    If flds = 3 Then
        For v = LBound(vVALs, 1) To UBound(vVALs, 1)
            'fastest load method but overwrites duplicate datetime values with the last temp1, temp2
            dict.Item(vVALs(v, 1)) = Array(vVALs(v, 1), vVALs(v, 2), vVALs(v, 3))
        Next v
    Else
        For v = LBound(vVALs, 1) To UBound(vVALs, 1)
            'fastest load method but overwrites duplicate datetime values with the last pressure
            dict.Item(vVALs(v, 1)) = Array(vVALs(v, 1), vVALs(v, 2))
        Next v
    End If

    Erase vVALs

End Sub

Public Sub appTGGL(Optional bTGGL As Boolean = True)
    With Application
        .ScreenUpdating = bTGGL
        .EnableEvents = bTGGL
        .DisplayAlerts = bTGGL
        .AutoRecover.Enabled = bTGGL   'no interruptions with an auto-save
        .Calculation = IIf(bTGGL, xlCalculationAutomatic, xlCalculationManual)
        .CutCopyMode = False
        .StatusBar = vbNullString
    End With
    Debug.Print Timer
End Sub

请参阅代码内注释以遵循程序流程。我认真地建议您从较小的样本数据集开始,并使用 F8 ctrl + F8 完成代码。在变量上设置一些watches。我留下了许多Debug.Print语句,可以取消注释,并通过VBE Immediate window观察其信息。

顺便说一句,我的默认工作簿是使用单个工作表创建的,而不是默认工作表。如果您仍然打开包含三个空白工作表的新工作簿,则可能需要在创建新目标Worksheet Object后立即调整代码以删除除空白Workbook Object之外的所有内容。创建新工作表以接收数据,并在创建时进行适当格式化。

<强> Results

虽然结果产生得足够快,但我认为~150K记录(加工后约135K)足以进行测试。这些结果被分成多个工作表,因为iMAXROWS常量我设置为每个工作表50K。

Temperature_Pressure_CSVs_results
TP_20160501_65489.xlsb

  

150K + 150K在约29秒内处理成~140K合并记录。

您可能还希望认真考虑转向数据库解决方案。

有关处理大型工作表的提示,请参阅 Highlight Duplicates and Filter by color alternative