如何用`hostnametld`替换字符串中的所有URL?

时间:2018-10-30 17:10:13

标签: python regex

例如:

http://stackoverflow.com/questions/ask => stackoverflowcom

以下方法有效,但不适用于https在网址之外的特殊情况。

import re
from urllib.parse import urlparse

def convert_urls_to_hostnames(s):
    try:
        new_s = re.sub("http\S+", lambda match: urlparse(match.group()).hostname.replace('.','') if match.group() else urlparse(match.group()).hostname, s)
        return new_s
    except Exception as e:
        print(e)
    return s

大部分有效。

s = "Ask questions here: http://stackoverflow.com/questions/ask"
print(convert_urls_to_hostnames(s))

正确返回:Ask questions here: stackoverflowcom

但是,如果在URL之外的字符串中的任意位置找到http*s,则失败:

s = "Urls may start with http or https like so: http://stackoverflow.com/questions/ask and https://example.com/questions/"
print(convert_urls_to_hostnames(s))

这将返回:'NoneType' object has no attribute 'replace'

预期收益:Urls may start with http or https like so: stackoverflowcom and examplecom

1 个答案:

答案 0 :(得分:0)

在正则表达式中查找return rankref.orderBy("earned_points").limit(10).get().then(function(dataSnapshot) { let i = 0; console.log(dataSnapshot) dataSnapshot.forEach(function(childSnapshot) { const r = dataSnapshot.numChildren() - i; console.log(childSnapshot) updates.push(childSnapshot.ref.update({rank: r})); leaderboard[childSnapshot.key] = Object.assign(childSnapshot.val(), {rank: r}); i++; }); updates.push(leaderboardRef.set(leaderboard)); return Promise.all(updates); http://,即https://