如何在每半小时后获得最接近的日期

时间:2016-06-20 10:39:34

标签: excel vba excel-vba datetime large-data

我有一个非常大的数据集,在

中看起来像这样
temp_dataframe['PPI'].str.replace('PPI/','')

现在数据集中的总行数约为5468389,这对于excel来说非常大,可以导入一列中的所有内容,因此我尝试处理部分数据。

还有其他方法吗?我可以通过它处理所有数据? 我尝试直接读取和写入文本,但每当我尝试将其作为日期读取时,由于格式的原因,它会给我 Column A Date 2016-02-29 15:59:59.674 2016-02-29 15:59:59.695 2016-02-29 15:59:59.716 2016-02-29 15:59:59.752 2016-02-29 15:59:59.804 2016-02-29 15:59:59.869 2016-02-29 15:59:59.888 2016-02-29 15:59:59.941 2016-02-29 16:00:00.081 <-- get closest date since .081 < .941 2016-02-29 16:00:00.168 2016-02-29 16:00:00.189 2016-02-29 16:00:00.198 2016-02-29 16:00:00.247 2016-02-29 16:00:00.311 2016-02-29 16:00:00.345 2016-02-29 16:00:00.357 and for the other half an hour 2016-02-29 16:29:58.628 2016-02-29 16:29:58.639 2016-02-29 16:29:58.689 2016-02-29 16:29:58.706 2016-02-29 16:29:58.761 2016-02-29 16:29:58.865 2016-02-29 16:29:59.142 2016-02-29 16:29:59.542 2016-02-29 16:29:59.578 2016-02-29 16:30:00.171 <-- Get this date since .171 < .578 2016-02-29 16:30:00.209 2016-02-29 16:30:00.217 2016-02-29 16:30:00.245 2016-02-29 16:30:00.254 2016-02-29 16:30:00.347 2016-02-29 16:30:00.422 2016-02-29 16:30:00.457 2016-02-29 16:30:00.491 2016-02-29 16:30:00.555 2016-02-29 16:30:00.557 2016-02-29 16:30:00.645 错误。出于同样的原因,我没有使用python来解决这个问题,因为我也不熟悉python,所以我想在Excel VBA中这样做。

我也不太确定这个逻辑,所以我需要一些帮助。

Type Mismatch

此外,我不确定如何从避免计算两个日期的差异的日期获得毫秒部分

如15:59:59.674我如何才能获得Option Explicit Sub Get_Closest_Dates() Application.ScreenUpdating = False Dim WI As Worksheet, WO As Worksheet Dim i As Long, ct As Long Dim num1 As Integer, num2 As Integer, num3 As Integer Dim df1, df2 Set WI = Sheet1 'INPUT SHEET Set WO = Sheet2 'OUTPUT SHEET WI.Range("A:A").NumberFormat = "YYYY-MM-DD HH:MM:SS" WO.Range("A:A").NumberFormat = "YYYY-MM-DD HH:MM:SS" WI.Range("B1") = "HOUR" WI.Range("C1") = "MINUTE" With WI .Range("B2").Formula = "=HOUR(A2)" .Range("B2:B" & Rows.Count).FillDown .Range("C2").Formula = "=MINUTE(A2)" .Range("C2:C" & Rows.Count).FillDown ct = WO.Range("A" & Rows.Count).End(xlUp).Row + 1 For i = 2 To 10000 num1 = .Range("C" & i).Value 'get Minutes num2 = .Range("C" & i + 1).Value If (num1 = 29 And num2 = 30) Then df1 = 0.5 - TimeValue(.Range("A" & i)) df2 = TimeValue(.Range("A" & i + 1)) - 0.5 If df1 < df2 Then WO.Range("A" & ct) = .Range("A" & i) ct = ct + 1 Else WO.Range("A" & ct) = .Range("A" & i + 1) ct = ct + 1 End If End If If (num1 = 59 And num2 = 0) Then df1 = 1 - TimeValue(.Range("A" & i)) df2 = TimeValue(.Range("A" & i + 1)) - 1 If df1 < df2 Then WO.Range("A" & ct) = .Range("A" & i) ct = ct + 1 Else WO.Range("A" & ct) = .Range("A" & i + 1) ct = ct + 1 End If End If Next i End With Application.ScreenUpdating = True MsgBox "Process Completed" End Sub

2 个答案:

答案 0 :(得分:1)

似乎您的第一个问题是将数据导入Excel。了解Excel可能不是处理如此大量数据的最佳程序(诸如Access之类的DB程序可能更好),您需要在多个列或工作表之间拆分数据;或者采集数据样本。

您选择了样本,因此我会在您阅读数据时进行抽样和测试。

您还必须在处理包含毫秒的日期/时间戳时处理Excel / VBA限制。

但是为了测试数据,没有必要关注毫秒。只要您的数据按升序排列,那么具有等于或高于30分钟增量的日期/时间戳的第一行将是最早的。

下面的代码应该只读取符合该条件的大文件行。请阅读评论以获取更多信息。

将线条收集到一个集合中;然后声明,填充结果数组,并将结果写入工作表。

如果每一行包含多个字段,而不仅仅是您显示的单行,那么在编写结果时,您将声明结果数组以保存所有列,并在此时填充它。

使用Collection / Array / write到工作表序列比在处理工作表时每行编写一行要快得多。

有加速代码的方法,以及处理可能的&#34;内存不足的方法&#34;错误,但这取决于您的真实数据以及使用这个简单代码的方式。

到目前为止转换我们需要的日期/时间戳,将Excel解释为字符串,转换为&#34; real&#34;日期/时间,取决于您对后续数据的处理方式。

==========================================

Option Explicit
'Set Reference to Microsoft Scripting Runtime
Sub GetBigData()
    Dim FSO As FileSystemObject
    Dim TS As TextStream
    Dim vFileName As Variant
    Dim sLine As String
    Dim dtLineTime As Date
    Dim dtNextTime As Date
    Dim colLines As Collection

vFileName = Application.GetOpenFilename("Text Files(*.txt), *.txt")
If vFileName = False Then Exit Sub

Set FSO = New FileSystemObject
Set TS = FSO.OpenTextFile(vFileName, ForReading, False, TristateFalse)
Set colLines = New Collection

With TS
    'Assumes date/time stamps are contiguous
    'skip any header lines
    Do
        sLine = .ReadLine
    Loop Until InStr(sLine, ".") > 0

'Compute first "NextTime"
'  note that it might be the first entry
'  comment line 3 below if want first entry
'  but would need to add logic if using other time increments
dtLineTime = CDate(Left(sLine, InStr(sLine, ".") - 1))
dtNextTime = Int(dtLineTime) + TimeSerial(Hour(dtLineTime), Int(Minute(dtLineTime) / 30) * 30, 0)
If Not (Minute(dtLineTime) = 30 Or Minute(dtLineTime) = 60) Then dtNextTime = dtNextTime + TimeSerial(0, 30, 0)

Do
    'Due to IEEE rounding problems, need to test equality as a very small value
    'Could use a value less than 1 second = 1/86400 or smaller
    If Abs(dtLineTime - dtNextTime) < 0.00000001 Or _
        dtLineTime > dtNextTime Then
            colLines.Add sLine
            dtNextTime = dtNextTime + TimeSerial(0, 30, 0)
    End If
    If Not .AtEndOfStream Then
        sLine = .ReadLine
        dtLineTime = CDate(Left(sLine, InStr(sLine, ".") - 1))
    End If
Loop Until .AtEndOfStream

.Close
End With

'Write the collection to the worksheet
Dim V As Variant
Dim wsResults As Worksheet, rResults As Range
Dim I As Long

Set wsResults = Worksheets("sheet1")
Set rResults = wsResults.Cells(1, 1)

ReDim V(1 To colLines.Count, 1 To 1)
Set rResults = rResults.Resize(UBound(V, 1), UBound(V, 2))

For I = 1 To UBound(V, 1)
     V(I, 1) = CStr(colLines(I))
Next I

With rResults
    .EntireColumn.Clear
    .NumberFormat = "@"
    .Value = V
    .EntireColumn.AutoFit
End With

End Sub

==========================================

编辑添加了时间戳转换功能。 这可以在将数据从集合对象复制到变体数组的点处实现。 EG:

V(I, 1) = ConvertTimeStamp(colLines(I))

由于收到的值是Double数据类型,您还需要在工作表上正确格式化该列,而不是将其作为Text:

.NumberFormat = "yyyy-mm-dd hh:mm:ss.000"

我们必须将值作为Double返回,因为VBA Date类型数据不支持毫秒。

==============================

Private Function ConvertTimeStamp(sTmStmp As String) As Double
    Dim dtPart As Date
    Dim dMS As Double 'milliseconds
    Dim V As Variant

'Convert the date and time
V = Split(sTmStmp, ".")
dtPart = CDate(V(0))
dMS = V(1)

ConvertTimeStamp = dtPart + dMS / 86400 / 1000

End Function

==============================

答案 1 :(得分:0)

如果您反转排序顺序,则可以使用匹配功能查找列表中的条目索引,该条目的索引大于(仅在特定时间之后)。 类似的东西:

= MATCH(HalfHourValue,RangeContainingTimes,-1)

你必须扭转秩序;并且它为您提供索引而不是实际值。

要获得刚刚找到的条目值的毫秒数,可以使用以下内容:

= RIGHT(TEXT(INDEX(RangeContainingTimes,IxFromAbove,1), “HH:MM:ss.000”),3)