无法获取以自定义关键字

时间:2018-06-17 12:00:25

标签: string vba excel-vba web-scraping excel

我正在尝试在维基百科页面中获取以关键字WarLPGA开头的字符串。我没有直接使用这两个关键字;相反,我通过keyword变量使用它们,因为qsp可能包含更多项目。

但是,当我使用Like运算符运行脚本时,我什么都没得到。也没有错误。当我使用If InStr(post.innerText, keyword) > 0 Then这段代码运行时,我得到了结果。问题是当我使用InStr()函数运行我的脚本时,它将获取包含keyword我用作变量的字符串,而不是以keyword开头的字符串。< / p>

那么,如何在我的下面的脚本中使用Like运算符来实现相同的目标。

到目前为止,这是我的尝试:

Sub FetchInfo()
    Const URL As String = "https://en.wikipedia.org/wiki/Portal:Current_events"
    Dim Http As New XMLHTTP60, Html As New HTMLDocument
    Dim post As Object, qsp As Variant, keyword As Variant, R&

    qsp = [{"War in Donbass","LPGA Tour"}]

    For Each keyword In qsp
        keyword = Split(keyword, " ")(0)
        With Http
            .Open "GET", URL, False
            .send
            Html.body.innerHTML = .responseText
        End With

        For Each post In Html.getElementsByTagName("a")
            If post.innerText Like "*keyword" Then
                R = R + 1: Cells(R, 1) = post.innerText
            End If
        Next post
    Next keyword
End Sub

当我使用InStr()函数时,该部分看起来像:

If InStr(post.innerText, keyword) > 0 Then
    R = R + 1: Cells(R, 1) = post.innerText
End If

更清楚一点:如果我想搜索War,那么我想得到的结果(下面的结果是假设的,它们可能不存在于该网站中):

war house
war of the worlds

不喜欢:

city of war
causes of the war

1 个答案:

答案 0 :(得分:1)

我知道您喜欢非常适合您的问题陈述的答案....但是我对我是否可以使用Selenium感兴趣。因此,我使用“战争”运行以下命令,以查看是否可以匹配以“战争”开头的 +------------+--------------+-------+-----------+-----------+---------+-------------+---------+-------------+---------+-------------+---------+-------------+ | Monitor Id | Casting Date | Label | Client | Project | 1 Day | | 2 Days | | 4 Days | | 8 Days | | +------------+--------------+-------+-----------+-----------+---------+-------------+---------+-------------+---------+-------------+---------+-------------+ | | | | | | avg str | avg density | avg str | avg density | avg str | avg density | avg str | avg density | | | | | | | | | | | | | | | | 1082 | 05/07/2018 | B52 | Trial Mix | Trial Mix | 21.78 | 2.436 | 33.11 | 2.406 | 43.11 | 2.44 | 48.22 | 2.444 | | 1083 | 05/07/2018 | B53 | Trial Mix | Trial Mix | 10.44 | 2.421 | 20 | 2.4 | 27.78 | 2.397 | 33.33 | 2.409 | | 1084 | 05/07/2018 | B54 | Trial Mix | Trial Mix | 12.89 | 2.43 | 24.44 | 2.427 | 34.22 | 2.412 | 41.56 | 2.501 | +------------+--------------+-------+-----------+-----------+---------+-------------+---------+-------------+---------+-------------+---------+-------------+ 标签的字符串。显然,可以按照您的原始示例进行扩展,但是它与常规任务匹配吗?

旁注:我猜您本可以在from lib.database import * import matplotlib.pyplot as plt from datetime import datetime,timedelta from prettytable import PrettyTable import numpy as np #table to hold data table = PrettyTable() table.field_names = ['Monitor ID','Casting Date','Label','AGE','Client Name','Project', 'Average Strength','Average Density'] #interval of 2 weeks ago int = datetime.today()-timedelta(days=14) result = MonitorCombine.select(ResultCombine.strength.alias('str'),ResultCombine.density.alias('density'),ResultCombine.age,MonitorCombine.clientname,MonitorCombine.p_alias,MonitorCombine.monitorid, MonitorCombine.monitor_label,MonitorCombine.casting_date).join(ResultCombine, on=(ResultCombine.monitorid == MonitorCombine.monitorid)).dicts().where(MonitorCombine.casting_date > int).order_by(MonitorCombine.monitor_label,ResultCombine.age.asc()) for r in result: table.add_row([r['monitorid'],r['casting_date'],r['monitor_label'],r['age'],r['clientname'],r['p_alias'],r['str'],r['density']]) print(table) 上使用a并在示例中测试了LBound。


XPath

我使用Split.innerText来检索字符串。

在以下页面上使用了XPath查询:https://en.wikipedia.org/wiki/War_correspondent


代码输出:

Output


它与以战争结尾的项目不匹配(Selenium basic似乎不允许使用XPath,但是如果您使用XPath测试器,则检索到的项目将是(少量结果样本):< / p>

Sample results not matched


VBA:

starts-with