正则表达式检查url格式

时间:2016-10-04 03:07:07

标签: ruby regex string-matching

我想检查我的网址格式是否正确,是否有一些AWS访问密钥等:

/https://bucket.s3.amazonaws.com/path/file.txt?AWSAccessKeyId=[.+]&Expires=[.+]&Signature=[.+]/.match(url)

^像这样的东西。你能帮忙吗?

2 个答案:

答案 0 :(得分:1)

URI RFC指定用于解析URL和URI的正则表达式:

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?

您还可以使用Ruby标准库中的URI module

require 'uri'
if url =~ /^#{URI::regexp(%w(http https))}$/
  puts "it's an url alright"
else
  puts "that's no url, that's a spaceship"
end

要检查是否存在“某些AWS访问密钥等”,您可以执行以下操作:

require 'uri'
uri = URI.parse(url)
params = URI.decode_www_form(uri.query).to_h
if params.has_key?('AWSAccessKeyId')
  unless params['AWSAccessKeyId'] =~ /\A[a-f0-9]{32}\z/
    abort 'AWSAccessKeyId not valid'
  end
else
  abort 'AWSAccessKeyId required'
end

当然你可以直接使用正则表达式解析它们,但它会变得很难看,因为参数的顺序可能不同:

>> url = "https://bucket.s3.amazonaws.com/path/file.txt?AWSAccessKeyId=abcd12345&Expires=12345678&Signature=abcd"
>> matchdata = url.match(
   /
    \A
      (?<scheme>http(?:s)?):\/\/
      (?<host>[^\/]+)
      (?<path>\/.+)\?
      (?=.*(?:[\?\&]|\b)AWSAccessKeyId\=(?<aws_access_key_id>[a-f0-9]{1,32}))
      (?=.*(?:[\?\&]|\b)Expires=(?<expires>[0-9]+))
   /x
  )
=> #<MatchData "https://bucket.s3.amazonaws.com/path/file.txt?"
  scheme:"https" 
  host:"bucket.s3.amazonaws.com" 
  path:"/path/file.txt"
  aws_access_key_id:"abcd12345"
  expires:"12345678">

>> matchdata[:aws_access_key_id]
# => "abcd12345"

这使用

  1. 正则表达式的正面预测:(?=..)忽略参数 订单
  2. Ruby的正则表达式命名捕获(?<param_name>.*)来识别 来自匹配数据的参数
  3. 非捕获分组(?abcd|efgh)
  4. 处理(?[\&\?]|\b)Expires=...?Expires=...
  5. 的匹配器&Expires=...
  6. 最后/x自由间距修改器为 允许更好的格式化

答案 1 :(得分:0)

我们需要一个网址来处理:

url = "/https://bucket.s3.amazonaws.com/path/file.txt?AWSAccessKeyId=somestuff&Expires=somemorestuff&Signature=evenmorestuff"

我们还需要逃避一堆东西并做一些非贪婪的匹配(。+?):

/https:\/\/bucket.s3.amazonaws.com\/path\/file\.txt\?AWSAccessKeyId=.+?&Expires=.+?&Signature=.+/.match(url)

 => #<MatchData "https://bucket.s3.amazonaws.com/path/file.txt?AWSAccessKeyId=somestuff&Expires=somemorestuff&Signature=evenmorestuff">