在Excel vba

时间:2016-07-18 15:28:31

标签: excel vba excel-vba

我试图在Excel vba中创建一个应用程序,但我有些问题。我需要我的Excel应用程序来下载一些zip格式的文件。我已经完成了这部分问题,我的应用程序可以下载并解压缩文件。接下来我要读取.htm扩展名中的整个文件并从中获取一些信息。需要像这样工作,当应用程序打开时,程序应该查找在Basil中的“concurso”的最后一个竞赛号码,然后在.htm文件中查找相同的号码并开始复制下一个数据。

我已经发现了一种读取文件并获取我想要的数据的模式,但我不知道如何编写它。要提取的.htm文件中的模式是内部标签td,一个有2个斜线的文本,因此,我有一个日期,此时,我要做3件事,得到日期,日期上面的行我的concurso数量,所以我也需要得到它,并且15天以下的15行我也需要它们。此模式不会更改,必须处理直到.htm文件结束。并将这些数据传输到我的工作表以便稍后进行操作。

如果对这个问题有些怀疑,我会提供进一步的澄清 这是我用来下载和解压缩文件的代码。↓

Sub DownloadEUnzip()
    Dim FSO, oApp As Object
    Dim objHttp, DefPath, Arquivo As String
    Dim Dados() As Byte
    Dim Fname As Variant
    Dim FileNameFolder As Variant
    Dim iFileNumber As Long

    Dim diretorio As String

    diretorio = Dir("c:\lotofacil\D_LOTFAC.HTM")

    If diretorio = "D_LOTFAC.HTM" Then
        Kill "C:\lotofacil\*"
    End If

    Set objHttp = CreateObject("MSXML2.ServerXMLHTTP")
    objHttp.Open "GET", "http://www1.caixa.gov.br/loterias/_arquivos/loterias/D_lotfac.zip", False
    objHttp.Send
    DefPath = "C:\lotofacil\" '<<< Altere aqui
    Arquivo = DefPath & "D_lotfac.zip"
    If objHttp.Status = "200" Then
        Dados = objHttp.ResponseBody
        iFileNumber = FreeFile
        Open Arquivo For Binary Access Write As #iFileNumber
        Put #iFileNumber, 1, Dados
        Close #iFileNumber
    End If
    If Right(DefPath, 1) <> "\" Then
        DefPath = DefPath & "\"
    End If
    FileNameFolder = DefPath

    Set oApp = CreateObject("Shell.Application")
    oApp.Namespace(FileNameFolder).CopyHere oApp.Namespace("C:\lotofacil\D_lotfac.zip").items
    MsgBox "Arquivos baixados e descompactados com sucesso!"
End Sub

HERE可以下载文件以查看问题。

ps 在C:驱动器上创建一个名为lotofacil的文件夹,以便电子表格正常工作。

更新1

查找日期的代码

If Mid(dataline, 19, 1) = "/" And Mid(dataline, 22, 1) = "/" Then
    Debug.Print dataline
End If

更新2

所以caio,现在真的很快,但是当我使用时我注意到该程序占用了一个不到需要的列,并且我更改了代码并且它的工作原理显然......就像你看一看如果我没有任何混乱...... 我改变了数组的大小,看起来很有用:)看看。

Sub ReadLines()

Dim dataArray() As String
Dim strText
Dim result As String
Dim regExDate As New RegExp, regExAnyContent As New RegExp
Dim matches As MatchCollection
Dim match As match
Dim previous As String, current As String
Dim currentLine As Integer
ReDim dataArray(17, 1000)

regExDate.Pattern = "(\d{2}/\d{2}/\d{4})"
regExAnyContent.Pattern = "<td[^>]*>([^<]*)"
dirPath = "c:\lotofacil\"
filePath = dirPath & "D_LOTFAC.HTM"
result = ""
currentLine = 0

If Not Dir(filePath) = "D_LOTFAC.HTM" Then Exit Sub
FileNum = FreeFile()

Open filePath For Input As #FileNum
previous = ""

While Not EOF(FileNum)
    Line Input #FileNum, current ' read in data 1 line at a time

    If Len(current) > 0 Then
        Set matches = regExDate.Execute(current)
        If matches.Count > 0 Then
            dataArray(1, currentLine) = matches.Item(0)
            dataArray(0, currentLine) = regExAnyContent.Execute(previous).Item(0).SubMatches(0)
            For i = 1 To 16
                Line Input #FileNum, current
                While current = ""
                    Line Input #FileNum, current
                Wend
                dataArray(1 + i, currentLine) = regExAnyContent.Execute(current).Item(0).SubMatches(0)
            Next
            currentLine = currentLine + 1
            If currentLine Mod 1000 = 0 Then
                ReDim Preserve dataArray(17, currentLine + 1000)
            End If
        End If
        previous = current
    End If


    ' decide what to do with dataline,
    ' depending on what processing you need to do for each case
Wend

Range(Cells(1, 1), Cells(currentLine, 17)) = Application.Transpose(dataArray)

End Sub

但仍然发生了一件非常奇怪的事情,在数据表中,日期是错误的,我需要它们的格式为dd / mm / yyyy,我知道我已经在烦你了,但如果是太难做出这个改变你可以取消这个日期的专栏吗?请...

首先非常感谢你,你真的很擅长excel;)

1 个答案:

答案 0 :(得分:1)

尝试将此文件读取到剪贴板并将其内容粘贴到工作表中,这将创建一个您可以使用的普通Excel表。

这将使用excel的自然能力将html表解析为常规excel表。

Sub ReadFilePasteAsTable()
Dim objData As New MSForms.DataObject
Dim strText
Dim result As String
Dim numberOfLines Integer
Dim wsh As Object
Set wsh = VBA.CreateObject("WScript.Shell")


numberOfLines = 126
dirPath = "c:\lotofacil\"
diretorio = Dir(dirPath & "D_LOTFAC.HTM")
result = ""

If Not diretorio = "D_LOTFAC.HTM" Then Exit Sub
FileNum = FreeFile()

filePath = dirPath & "D_LOTFAC.HTM"
outPath = dirPath & "out.txt"
pscommand = "Powershell -Command ""''+$(cat " & filePath & " -Tail 126) > " & outPath & """"
wsh.Run pscommand, 0, True

Open outPath For Input As #FileNum

While Not EOF(FileNum)
    Line Input #FileNum, DataLine ' read in data 1 line at a time
    result = result & DataLine
    ' decide what to do with dataline,
    ' depending on what processing you need to do for each case
Wend

    objData.SetText result
    objData.PutInClipboard

ActiveSheet.Paste Destination:=[A1]
End Sub

不要忘记添加对Microsoft Forms 2.0的引用。要添加引用打开VBA窗口,请打开菜单Tools-&gt; References ...

如果找不到 Microsoft Forms 2.0对象库,请打开浏览...对于64位操作系统或C:\ Windows \ System32,它将位于C:\ Windows \ SysWOW64 \ FM20.dll中\ FM20.dll为32位。

更新

现在您需要添加对Microsoft VBScript Regular Expressions 5.5

的引用
Sub ReadLines()
Dim dataArray() As String
Dim strText
Dim result As String
Dim regExDate As New RegExp, regExAnyContent As New RegExp
Dim matches As MatchCollection
Dim match As match
Dim previous As String, current As String
Dim currentLine As Integer
ReDim dataArray(16, 1000)

regExDate.Pattern = "(\d{2}/\d{2}/\d{4})"
regExAnyContent.Pattern = "<td[^>]*>([^<]*)"
dirPath = "c:\lotofacil\"
filePath = dirPath & "D_LOTFAC.HTM"
result = ""
currentLine = 0

If Not Dir(filePath) = "D_LOTFAC.HTM" Then Exit Sub
FileNum = FreeFile()

Open filePath For Input As #FileNum
previous = ""

While Not EOF(FileNum)
    Line Input #FileNum, current ' read in data 1 line at a time

    If Len(current) > 0 Then
        Set matches = regExDate.Execute(current)
        If matches.Count > 0 Then
            dataArray(1, currentLine) = matches.Item(0)
            dataArray(0, currentLine) = regExAnyContent.Execute(previous).Item(0).SubMatches(0)
            For i = 1 To 15
                Line Input #FileNum, current
                While current = ""
                    Line Input #FileNum, current
                Wend
                dataArray(1 + i, currentLine) = regExAnyContent.Execute(current).Item(0).SubMatches(0)
            Next
            currentLine = currentLine + 1
            If currentLine Mod 1000 = 0 Then
                ReDim Preserve dataArray(16, currentLine + 1000)
            End If
        End If
        previous = current
    End If


    ' decide what to do with dataline,
    ' depending on what processing you need to do for each case
Wend



Range(Cells(1, 1), Cells(currentLine, 16)) = Application.Transpose(dataArray)
End Sub