Question

我希望我的正则表达式能够找到一个URL，以将其转换为html链接。正则表达式将用于如下所示的链接：www.site.extension和https://site.extension。正则表达式为\ S *。？w {3}。\ S +。\ S +，当使用https://regexr.com/时，它的确提供了所需的结果。但是，当使用我的python脚本时，我得到的结果与预期相反，因此，所有不是链接的东西都被视为好像是，但是找不到链接。

python代码是：

testbestand = """TESTBESTAND

Div1 kjaskdjfiudhgjnkcvdnbk djskj ij g ijg jkdfnbdiiji jj iikdafnbn ojedfkj giqw34
Akdjfkjasdf

Div2 aksjdfkj sadfkjg sdkjiew kvckjeri cdkj sdkeridk erkire

Div3 kajkdjfkjakdjgsdghijskdg

Div 4 www.link.com

Div5
Table Left  Table Right
Table Left 2    Table Right 2
Table Left 3    Table Right 3
"""

fileContent = testbestand
toAddToFile = ""

#find links
pattern = re.compile(r'\S*\.?w{3}\.\S+\.\S+')
matches = re.split(pattern, fileContent)\

for match in matches:
    match = match.strip()

    if len(match) > 0:
        #TODO change to 'edit' file, instead of adding to it
        test = """<a href=" """ + match + """>" """ + match + "</a>"
        print(test)

        toAddToFile += """<a href=" """ + match + """>" """ + match + "</a>"

在此先感谢您的帮助！如果需要更多信息或代码，我会立即提供。

Answer 1

那是因为您使用re.split，它旨在将文本 at 拆分为模式。而是使用`re.findall：

pattern = re.compile(r'\S*\.?w{3}\.\S+\.\S+')
matches = pattern.findall(fileContent)

Answer 2

您应该使用re.sub而不是re.split：

toAddToFile = re.sub(r'(\S*\.?w{3}\.\S+\.\S+)', r'<a href="\1">\1</a>', fileContent)

Python：正则表达式与期望值相反

2 个答案: