How can I extract a url from a string, which contains a space between the protocol and the rest of the address?

时间:2017-08-04 12:11:11

标签: python regex

Suppose, I have the following string (in Python):

myString = "For further information please visit http:// somewebpage.com and please do not hesitate to contact us"

I'd like to extract the following url:

http:// somewebpage.com

I found solutions using regex but not for the case of a blank before the address.

4 个答案:

答案 0 :(得分:4)

Like this:

myString = myString.split()
index = myString.index('http://')
url = ''.join(myString[index:index+2])

Notice i'm splitting the sentence on every word but only connecting the http part with the one immediately after that.

If you actually need the space (I can't imagine why) then replace '' with ' '

答案 1 :(得分:1)

Pure regex solution:

http://\s[\w\.]+
  • [\w\.] looks for any letter or period
  • + looks for the above characters 1 or more times

答案 2 :(得分:1)

Try this regex :

>>>mystring = "For further information please visit http:// somewebpage.com and please do not hesitate to contact us"

>>>url = re.findall('http[s]?:// (?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', mystring)[0]
>>>url
http:// somewebpage.com

答案 3 :(得分:0)

/https?:\/\/\s\S+/g
  • http - Matches http sequence
  • s? - Matches 0 or 1 s (for https)
  • : - Matches :
  • // - Matches two //
  • \s - Matches one space
  • \S+ - Matches any character that is not a space 1 or more times

The regex will match:

http:// somewebpage.com
https:// somewebpage.com
http:// 1234.com/test

But not:

ftp:// www.test.com.xx
http://www.google.com
http:// 

http://www.regexpal.com/?fam=98273