Question

我有一个很大的字符串，我想在其中找到一个小的字符串或值（在我的示例14中）。它的一个片段看起来像这样：

我需要检索14。我们发现78是动态的，我从字典（someDict）中获取了它的值。

str1='dnas  ANYTHING Here <td class="tr js-name"><a href="/myportal/report/78/abc/xyz/14" title="balh">blah</a></td>'

str2="/myportal/report/"+str(someDict["Id"])+"/abc/xyz/"

p = re.compile(r'str2\s*(.*?)\"')
match = p.search(str1)
if match:
    print(match.group(1))
else:
    print("cant find it")

我知道-> p = re.compile(r'str2\s*(.*?)\"')有问题，因为我不能坚持使用str2，请问如何使用编译

Answer 1

您要解析的字符串看起来像HTML，正则表达式is not exactly the best tool for the job。我会使用更专业的工具-一个 HTML解析器，例如BeautifulSoup：

from urllib.parse import urlparse

from bs4 import BeautifulSoup


data = 'dnas  ANYTHING Here <td class="tr js-name"><a href="/myportal/report/78/abc/xyz/14" title="balh">blah</a></td>'

soup = BeautifulSoup(data, "html.parser")
href = soup.select_one("td.tr.js-name > a")["href"]

parsed_url = urlparse(href)
print(parsed_url.path.split("/")[-1])

打印14。

请注意，这里td.tr.js-name > a是CSS selector，这是可用于在HTML中定位元素的一种技术：

>表示直接父子关系
td.tr.js-name将匹配具有td和tr类值的js-name元素

Python使用Regex在大字符串中查找具有动态值的字符串

1 个答案: