Question

我正在尝试找到一个子串，它基本上是任何网站的链接。我们的想法是，如果用户发布了某些内容，则会提取链接并将其分配给名为web_link的变量。我目前的代码如下：

post = ("You should watch this video https://www.example.com if you have free time!")
web_link = post[post.find("http" or "www"):post.find(" ", post.find("http" or "www"))]

如果链接后面有空格键，则代码可以正常工作，但是，如果帖子中的链接位于最后。例如：

post = ("You should definitely watch this video https://www.example.com")

然后post.find(" ")无法找到空格键/空格并返回-1，从而导致web_link "https://www.example.co"

如果可能的话，我试图找到一个不涉及if语句的解决方案。

Answer 1

这不起作用的原因是因为如果找不到字符串并且返回-1，则slice命令将其解释为“结尾处字符串-1字符的其余部分”。

正如ifma所指出的，实现这一目标的最佳方法是使用正则表达式。类似的东西：

re.search("(https?://|www[^\s]+)", post).group(0)

Answer 2

使用正则表达式。我对解决方案here进行了一些改动。

import re

def func(post):
    return re.search("[(http|ftp|https)://]*([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:/~+#-]*[\w@?^=%&/~+#-])?", post).group(0)

print(func("You should watch this video www.example.com if you have free time!"))
print(func("You should watch this video https://www.example.com"))

输出：

www.example.com
https://www.example.com

但我应该说，使用“if”更简单明了：

def func(post):
    start = post.find("http" or "www")
    finish = post.find(" ", start)
    return post[start:] if finish == -1 else post[start:finish]

如果子字符串位于最后，则使用.find（“”）方法而不删除最后一个字符

2 个答案: