我有一个Excel文件,我想比较A列和D列中的日期并删除它们之间的差距。 例如基于这张图片 enter image description here A列的时间从14:56:23开始,D在14:56:18。所以我想要删除D列中的所有数据直到14:56:23,这样它在A和D中都是相同的。 这个问题会重复多次,所以我想开发一个宏来做它。
这是一个小程序,只比较A列和D列中的前两个单元格
Sub Edit_Date_time()
Dim r As Range
Dim l As Range
Set r = Range("A2")
Set l = Range("D2")
If r.Value <> l.Value Then
Range("D2:E2").Select
Selection.Delete Shift:=xlUp
End If
End Sub
问题是细胞包含日期和时间。时间,所以我无法将其作为值进行比较。我还要扩展此代码以涵盖整个A2&amp; D2列不仅是前两个单元格。
答案 0 :(得分:1)
你的问题已经变形了很多次,但是我想问的是“如何在不匹配时间戳的情况下消除行”......
比较时间戳可能很棘手,即使它们格式正确也是如此。您希望2/17/2016 14:56:29
等于2/17/2016 14:56:29
,但是在字符串或一般格式中您看不到的毫秒差异。因此,在确定&lt ;,&gt;或=。
请记住,时间戳为1.0 = 1天。所以1/10秒(1/24/60/60/10)。
我假设您希望在这些时间戳早于相应的温度时间戳时删除压力行,并且您希望在这些时间戳早于相应的压力时间戳时删除温度行。
这意味着,在最坏的情况下,循环需要遍历整个数据集两次。
我根据这些数据测试了代码...
处理后,黄色单元格应对齐,橙色行应删除。以下是我得到的结果......
使用此代码...
Sub ParseDateTime()
Dim TRange As Range, PRange As Range
Dim iLoop As Long, LoopEnd As Long
Dim theRow As Long, LastRow As Long
' set the range for the temperature data
LastRow = Sheets("Sheet1").Range("A" & Sheets("Sheet1").Rows.Count).End(xlUp).Row
Set TRange = Sheets("Sheet1").Range("A1:C" & LastRow)
LoopEnd = LastRow
' set the range for the pressure data
LastRow = Sheets("Sheet1").Range("D" & Sheets("Sheet1").Rows.Count).End(xlUp).Row
Set PRange = Sheets("Sheet1").Range("D1:E" & LastRow)
If LastRow > LoopEnd Then LoopEnd = LastRow
' loop through the range
theRow = 1
For iLoop = 2 To 2 * LoopEnd
theRow = theRow + 1
' stop searching when no more data
If TRange(theRow, 1) = "" And PRange(theRow, 1) = "" Then Exit For
' if out of temperature data, eliminate the rest of the pressure data
If TRange(theRow, 1) = "" Then
PRange.Rows(theRow).Delete Shift:=xlUp
theRow = theRow - 1
End If
' if out of pressure data, eliminate the rest of the temperature data
If PRange(theRow, 1) = "" Then
TRange.Rows(theRow).Delete Shift:=xlUp
theRow = theRow - 1
End If
' eliminate pressure rows where the time stamp is earlier than the temperature timestamp
If (TRange(theRow, 1).Value > PRange(theRow, 1).Value) And _
(Abs(TRange(theRow, 1).Value - PRange(theRow, 1).Value) >= (1# / 24# / 60# / 60# / 10#)) Then
PRange.Rows(theRow).Delete Shift:=xlUp
theRow = theRow - 1
Else
' eliminate temperature rows where the time stamp is earlier than the pressure timestamp
If (TRange(theRow, 1).Value < PRange(theRow, 1).Value) And _
(Abs(TRange(theRow, 1).Value - PRange(theRow, 1).Value) >= (1# / 24# / 60# / 60# / 10#)) Then
TRange.Rows(theRow).Delete Shift:=xlUp
theRow = theRow - 1
End If
End If
Next iLoop
End Sub
答案 1 :(得分:0)
这个问题的重新定义使得难以处理;特别是因为一些(现在已删除)标准会使当前问题的某些解决方案变得不切实际。
我记得你的数据来自多个CSV文件;一些含有温度,一些含有压力。事实上,有太多的数据可以让人想到“溢出”到另一张工作表。仅此事实使得单个工作表值比较变得不切实际。即使它完全适用于单个工作表,将一百万个日期时间与第二个百万个日期时间进行比较,并删除不适合这两个类别的条目将是一项艰巨而耗时的任务。
艰苦而耗时的任务最好在“内存中”处理。反复返回工作表来比较值会导致处理陷入困境,除非绝对必要,否则应该避免使用。
这似乎应该是一个SQL问题,其中两组不同的CSV被加载到两个临时但统一的数据库表中,并在各自的日期时间内编入索引。然后可以执行INNER JOIN来构建匹配记录的第三个表。易于peasy。
但这是一个 excel / vba 问题,应该以实物回答。
VBA Scripting.Dictionary对象就像一个内存数据库表,并带有一个名为key的唯一主要“索引”。它还具有变体类型的单个附加“字段”,其可以接收变体可以具有的任何样式的值或值。使用日期时间作为键来加载两个带有相应值的字典(一个用于温度,另一个用于压力)似乎是将两者结合起来的最有效方法。
<强> Sample data 强>
我从几个类似于以下内容的CSV开始。
Temperaturen-25.csv SPS-25.csv
三个温度CSV和三个压力CSV总计约300K记录(每个约150K),每个记录的故意缺失日期时间。
<强> Module2 (Code) 强>
Option Explicit
'public constant dictating the maximum number of entries per worksheet (never set higher than Rows.Count-3)
Public Const iMAXROWS As Long = 50000
Sub main()
Dim fp As String, fn As String, tmp As Variant
Dim dt As Variant, tdic As Object, pdic As Object
Dim tpwb As Workbook, a As Long, d As Long, w As Long
'apptggl btggl:=false 'uncomment this when you have finished debugging
'create 2 dictionary objects to receive ALL of the data
Set tdic = CreateObject("Scripting.Dictionary")
Set pdic = CreateObject("Scripting.Dictionary")
tdic.CompareMode = vbBinaryCompare
pdic.CompareMode = vbBinaryCompare
'load the dictionaries using the overwrite method
fp = Environ("TMP") & Chr(92) & "TempPress"
fn = Dir(fp & Chr(92) & "*.csv", vbNormal)
Do While CBool(Len(fn))
Select Case True
Case LCase(fn) Like "*temperaturen*"
'debug.Print "found " & fn
loadTPDictionary CStr(fp & Chr(92) & fn), tdic, 3
Case LCase(fn) Like "*sps*"
'debug.Print "found " & fn
loadTPDictionary CStr(fp & Chr(92) & fn), pdic, 2
Case Else
'do nothing; not temperature or pressure
End Select
'debug.Print tdic.Count & ":" & pdic.Count
fn = Dir
Loop
'debug.Print tdic.Count
'debug.Print pdic.Count
'At this point you have two dictionary object; one for temps and one for pressures
'They have a unique indexed key on their datetime values
'Time to merge the two
'First load all matching pressures into the temperatures
For Each dt In tdic
If pdic.Exists(dt) Then
tdic.Item(dt) = Array(tdic.Item(dt)(0), tdic.Item(dt)(1), tdic.Item(dt)(2), _
pdic.Item(dt)(1), pdic.Item(dt)(0))
End If
Next dt
'Second, get rid of temps that had no matching pressure
For Each dt In tdic
If UBound(tdic.Item(dt)) < 4 Then
tdic.Remove dt
End If
Next dt
'debug.Print tdic.Count
'debug.Print pdic.Count
'At this point the temp dictionary object contains a merged set of matching temps and pressures
'Time to put the values into one or more worksheets
'create a new target workbook and set up the first target worksheet
Set tpwb = Workbooks.Add
With tpwb
For w = 1 To Int(tdic.Count / iMAXROWS) + 1
a = 1: d = 1
'first load an array with the dictionary's values
ReDim tmp(1 To iMAXROWS, 1 To 5)
For Each dt In tdic
If d > (w * iMAXROWS) Then
Exit For
ElseIf d > ((w - 1) * iMAXROWS) Then
tmp(a, 1) = tdic.Item(dt)(0)
tmp(a, 2) = tdic.Item(dt)(1)
tmp(a, 3) = tdic.Item(dt)(2)
tmp(a, 4) = tdic.Item(dt)(3)
tmp(a, 5) = tdic.Item(dt)(4)
a = a + 1
End If
d = d + 1
Next dt
On Error GoTo bm_Need_Worksheet
With .Worksheets(w + 1) '<~~ ignore the original blank worksheet from the new workbook
'dump the values back into the worksheet
.Cells(2, 1).Resize(UBound(tmp, 1), UBound(tmp, 2)) = tmp
'format the datetimes
.Range("A2:A" & UBound(tmp, 1) + 1 & ",E2:E" & UBound(tmp, 1) + 1).NumberFormat = _
"[Color10]mm/dd/yyyy hh:mm:ss"
.Columns("A:E").AutoFit
End With
'clear the variant array
Erase tmp
Next w
'get rid of the original unprepped worksheet created with the new workbook
.Worksheets(1).Delete
'save as a binary workbook due to size considerations
.SaveAs Filename:=fp & Chr(92) & Format(Date, "\T\P\_yyyymmdd\_") & CLng(Timer), _
FileFormat:=xlExcel12, AddToMru:=True
'close savechanges:=false 'uncomment this after debugging
End With
'we got safely here; skip over worksheet creation
GoTo bm_Safe_Exit
bm_Need_Worksheet:
On Error GoTo 0
With tpwb.Worksheets.Add(After:=Sheets(Sheets.Count))
On Error GoTo bm_Need_Worksheet
.Range("A1:E1") = Array("Date and Time", "Temperature 1", "Temperature 2", _
"Pressure", "Date and Time (p)")
.Name = "Temperaturen & Pressure " & w
With .Parent.Windows(1)
.SplitColumn = 0: .SplitRow = 1
.FreezePanes = True
.Zoom = 75
End With
End With
Resume
bm_Safe_Exit:
'discard the dictionary objects
tdic.RemoveAll: Set tdic = Nothing
pdic.RemoveAll: Set pdic = Nothing
'restore the application environment
appTGGL
End Sub
Sub loadTPDictionary(fpn As String, ByRef dict As Object, flds As Long)
Dim f As Long, v As Long, vVALs As Variant, wb As Workbook
Workbooks.OpenText Filename:=fpn, StartRow:=1, DataType:=xlDelimited, _
ConsecutiveDelimiter:=False, _
Comma:=True, Tab:=False, Semicolon:=False, Space:=False, Other:=False, _
FieldInfo:=IIf(flds = 3, Array(Array(1, 3), Array(2, 1), Array(3, 1)), _
Array(Array(1, 3), Array(2, 1)))
With ActiveWorkbook
With Worksheets(1)
'Debug.Print .Cells(1, 1).Value
vVALs = .Range(.Cells(2, 1), .Cells(Rows.Count, flds).End(xlUp)).Value2
End With
.Close SaveChanges:=False
End With
If flds = 3 Then
For v = LBound(vVALs, 1) To UBound(vVALs, 1)
'fastest load method but overwrites duplicate datetime values with the last temp1, temp2
dict.Item(vVALs(v, 1)) = Array(vVALs(v, 1), vVALs(v, 2), vVALs(v, 3))
Next v
Else
For v = LBound(vVALs, 1) To UBound(vVALs, 1)
'fastest load method but overwrites duplicate datetime values with the last pressure
dict.Item(vVALs(v, 1)) = Array(vVALs(v, 1), vVALs(v, 2))
Next v
End If
Erase vVALs
End Sub
Public Sub appTGGL(Optional bTGGL As Boolean = True)
With Application
.ScreenUpdating = bTGGL
.EnableEvents = bTGGL
.DisplayAlerts = bTGGL
.AutoRecover.Enabled = bTGGL 'no interruptions with an auto-save
.Calculation = IIf(bTGGL, xlCalculationAutomatic, xlCalculationManual)
.CutCopyMode = False
.StatusBar = vbNullString
End With
Debug.Print Timer
End Sub
请参阅代码内注释以遵循程序流程。我认真地建议您从较小的样本数据集开始,并使用 F8 和 ctrl + F8 完成代码。在变量上设置一些watches。我留下了许多Debug.Print
语句,可以取消注释,并通过VBE Immediate window观察其信息。
顺便说一句,我的默认工作簿是使用单个工作表创建的,而不是默认工作表。如果您仍然打开包含三个空白工作表的新工作簿,则可能需要在创建新目标Worksheet Object后立即调整代码以删除除空白Workbook Object之外的所有内容。创建新工作表以接收数据,并在创建时进行适当格式化。
<强> Results 强>
虽然结果产生得足够快,但我认为~150K记录(加工后约135K)足以进行测试。这些结果被分成多个工作表,因为iMAXROWS常量我设置为每个工作表50K。
150K + 150K在约29秒内处理成~140K合并记录。
您可能还希望认真考虑转向数据库解决方案。
有关处理大型工作表的提示,请参阅 Highlight Duplicates and Filter by color alternative 。