Question

我有一个示例文本数据如下：

1; abc; 111; 10-nov-2017 2; abc ; 222; 11-nov-2017 3; ABC; 333; 12-NOV-2017

鉴于2个输入 abc 和 11-nov1017 我想在两者之间提取字符串，即 222

如何使用regex获取结果？有没有其他方法可以实现相同的目标？

实际数据如下：

113434; Axis Gold ETF; 2651.2868; 2651.2868; 2651.2868; 2017年11月20日 113434; Axis Gold ETF; 2627.6778; 2627.6778; 2627.6778; 2017年11月21日 113434; Axis Gold ETF; 2624.1880; 2624.1880; 2624.1880; 2017年11月22日

任何帮助，非常感谢。谢谢！

Answer 1

如果存在，则有两种方法可以提取所需的子字符串。我们得到以下结论。

str = "1;abc;111;10-nov-2017 2;abc;222;11-nov-2017 3;abc;333;12-nov-2017"
before_str = "abc;"
date_str   = ";11-nov-2017"

我认为date_str中str的价值最多只出现一次。

＃1使用正则表达式

r = /
    .*            # match any number of characters greedily
    #{before_str} # match the content of the variable 'before_str'
    (.*)          # match any number characters greedily, in capture group 1
    #{date_str}   # match the content of the variable 'date_str'
    /x            # free-spacing regex definition mode
  #=> /.*abc;(.*);11-nov-2017/x

str[r,1]
  #=> "222"

这里的关键是正则表达式开头的.*。作为一个贪婪的匹配，它会使下一个匹配成为"abc;"的最后一个实例（before_str的值）";11-nov-2017"之前的值（date_str）。

＃2确定所需子字符串开头和结尾的索引

idx_date = str.index(date_str)
  #=> str.index(";11-nov-2017") => 31
idx_before = str.rindex(before_str, idx_date-before_str.size)
  #=> str.rindex("abc;", 27) => 24
str[idx_before + before_str.size..idx_date-1]
  #=> str[24+4..31-1] => str[28..30] => "222"

如果idx_date或idx_before为nil，则会返回nil，并且不会评估最后一个表达式。

参见String#rindex，尤其是可选的第二个参数的功能。

（可以写str[idx_before + date_str.before...idx_date]，但我发现在范围中使用三个点是潜在的错误来源，因此我总是使用两个点。）

Answer 2

您可以查看以下结果： /abc(.*?)10-nov-2017/g.exec("1;abc;111;10-nov-2017 2; abc; 222; 11-nov-2017 3; abc; 333; 12-nov- 2017" ）[1]

在两个重复多次的字符串之间提取文本

2 个答案: