Question

我需要从多行字符串中获取一个浮点数组（正数和负数）。例如：-45.124, 1124.325等

这就是我的所作所为：

text.scan(/(\+|\-)?\d+(\.\d+)?/)

虽然它在regex101上工作正常（捕获组0匹配我需要的所有东西），但它在Ruby代码中不起作用。

任何想法为什么会发生以及如何改进？

Answer 1

您应该删除捕获组，或使它们不捕获：

text = " -45.124, 1124.325"
puts text.scan(/[+-]?\d+(?:\.\d+)?/)

请参阅demo，输出：

-45.124
1124.325

请参阅scan documentation：

如果模式不包含任何组，则每个结果都由匹配的字符串$&组成。 如果模式包含组，则每个结果本身就是一个数组，每个组包含一个条目。

好吧，如果您还需要匹配.04等花车，可以使用[+-]?\d*\.?\d+。见another demo

Answer 2

([+-]?\d+\.\d+)

假设小数点前有一个前导数字

请参阅demo at Rubular

Answer 3

如果您需要捕获组来进行复杂的模式匹配，但是希望由.scan返回的整个表达式，则可以为您工作。

假设您想从带有html图像标签的markdown文本中获取此字符串中的图像url：

str = %(
Before
<img src="https://images.zenhubusercontent.com/11223344e051aa2c30577d9d17/110459e6-915b-47cd-9d2c-1842z4b73d71">

After
<img src="https://user-images.githubusercontent.com/111222333/75255445-f59fb800-57af-11ea-9b7a-a235b84bf150.png">).strip

您可能定义了一个正则表达式以仅匹配网址，并且可能使用了Rubular example like this来构建/测试Regexp

image_regex = 
  /https\:\/\/(user-)?images.(githubusercontent|zenhubusercontent).com.*\b/

现在，您不需要每个子捕获组，而只需将整个表达式放在您的.scan中，就可以将整个模式包装在捕获组中并像这样使用它：

image_regex = 
  /(https\:\/\/(user-)?images.(githubusercontent|zenhubusercontent).com.*\b)/

str.scan(image_regex).map(&:first)
=> ["https://user-images.githubusercontent.com/1949900/75255445-f59fb800-57af-11ea-9b7a-e075f55bf150.png",
 "https://user-images.githubusercontent.com/1949900/75255473-02bca700-57b0-11ea-852a-58424698cfb0.png"]

这实际上如何工作？

由于您有3个捕获组，仅.scan会返回一个Array的数组，每个捕获一组：

str.scan(image_regex)
=> [["https://user-images.githubusercontent.com/111222333/75255445-f59fb800-57af-11ea-9b7a-e075f55bf150.png", "user-", "githubusercontent"],
 ["https://images.zenhubusercontent.com/11223344e051aa2c30577d9d17/110459e6-915b-47cd-9d2c-0714c8f76f68", nil, "zenhubusercontent"]]

由于我们只希望第一个（外部）捕获组，所以我们可以只调用.map(&:first)

使用Ruby扫描方法捕获组不会按预期工作

3 个答案:

这实际上如何工作？