Question

我有一个网址字符串：

http://localhost:3000/user/event?profile_id=2&profile_type=UserProfile

我想提取＆＃34; 2＆＃34;和＆＃34; UserProfile＆＃34;，这些可以改变。

我尝试同时使用match和scan，但两者都没有返回结果：

url = "http://localhost:3000/user/event?profile_id=2&profile_type=UserProfile"
m = /http(s)?:\/\/(.)+\/user\/event?profile_id=(\d)&profile_type=(\w)/.match(url)
=> nil 

url.scan /http(s)?:\/\/(.)+\/user\/event?profile_id=(\d)&profile_type=(\w)/
=> []

知道我可能做错了吗？

Answer 1

不要使用模式来尝试这样做。查询参数的URL排序可以改变，并且不依赖于位置，这将立即破坏模式。

相反，请使用专为此目的设计的工具，例如内置的URI：

require 'uri'

uri = URI.parse('http://localhost:3000/user/event?profile_id=2&profile_type=UserProfile')

Hash[URI::decode_www_form(uri.query)].values_at('profile_id', 'profile_type') 
# => ["2", "UserProfile"]

通过这样做，您可以保证始终以预期的顺序获得正确的值，从而可以轻松分配它们：

profile_id, profile_type = Hash[URI::decode_www_form(uri.query)].values_at('profile_id', 'profile_type')

以下是中间步骤，以便您了解发生了什么：

uri.query # => "profile_id=2&profile_type=UserProfile"
URI::decode_www_form(uri.query) # => [["profile_id", "2"], ["profile_type", "UserProfile"]]
Hash[URI::decode_www_form(uri.query)] # => {"profile_id"=>"2", "profile_type"=>"UserProfile"}

Answer 2

match = url.match(/https?:\/\/.+?\/user\/event\?profile_id=(\d)&profile_type=(\w+)/)
p match.captures[0] #=> '2'
p match.captures[1] #=> 'UserProfile'

在你的表达中：

/http(s)?:\/\/(.)+\/user\/event?profile_id=(\d)&profile_type=(\w)/

您放入inside（）的所有内容都以正则表达式捕获。没有必要把s放在括号中因为？将仅对前一个角色采取行动。此外，不需要（。）因为+同样只会对前面的字符起作用。另外，（\ w）应该是（\ w +）基本上说：一个或多个字符（和＃39; UserProfile＆＃39;是1个或多个字符。

从URL字符串中提取多个模式

2 个答案: