Question

好的我知道ppl会说这个问题已被问了一百万次......但我的问题是不同。我已多次搜索stackoverflow以确保这不是重复...

我想在 Python 中使用正则表达式，这也有助于从字符串中提取URL 包含碎片

到目前为止，我所做的是：

import re

test = 'This is a string with my URL as follows http://www.example.org/foo.html#bar and here i continue with my string'

test = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', test)

print (test)

上面代码的输出是['http://www.example.org/foo.html']

这不是我想要的......

我希望输出为['http://www.example.org/foo.html#bar']

Answer 1

你原来的正则表达式是这样的：

http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+

你不能只是添加＃＆＃39;＃＆＃39;像这样？：

http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),#]|(?:%[0-9a-fA-F][0-9a-fA-F]))+

我不清楚你的意思是什么＆＃39;片段＆＃39; ......你的意思是指字符串中的空格吗？

使用带有正则表达式的

1 个答案: