我有这个文本块:
XQuery programming language
C# programming language
declarative programming
XSLT programming language
Haskell programming language vs F* programming language
我想检索编程语言的名称。
我试过像
这样的东西matches = string.scan('/(\w)*\sprogramming language/i')
但是这给了我这个:
[]
[]
[]
[]
虽然我想要一个这样的数组:
['XQuerye','C#','XSLT','Haskell']
我做错了什么?
答案 0 :(得分:6)
您必须删除正则表达式分隔符/
string.scan(/\S+(?=\sprogramming language)/i)
\S+
匹配一个或多个非空格字符。 (?=\sprogramming language)
肯定前瞻,断言匹配必须后跟空格和programming language
字符串。 i
修饰符使正则表达式引擎执行不区分大小写的匹配。
irb(main):001:0> str = "XQuery programming language
irb(main):002:0" C# programming language
irb(main):003:0" declarative programming
irb(main):004:0" XSLT programming language
irb(main):005:0" Haskell programming language vs F* programming language"
=> "XQuery programming language\nC# programming language\ndeclarative programming\nXSLT programming language\nHaskell programming language vs F* programming language"
irb(main):007:0> str.scan(/\S+(?=\sprogramming language)/i)
=> ["XQuery", "C#", "XSLT", "Haskell", "F*"]
答案 1 :(得分:1)
您只需对所拥有的内容进行一些小改动即可。我假设你想要的文字总是从一行的开头开始(因为你排除了'F*'
)并且与"programming language"
隔开一个或多个空格。
text =<<_
XQuery programming language
C# programming language
declarative programming
XSLT programming language
Haskell programming language vs F* programming language
_
text.scan(/(^.+?)\s+programming language/i).flatten
#=> ["XQuery", "C#", "XSLT", "Haskell"]
注意:
^
是行首锚。它需要位于捕获组(^.+)
内。如果我们有^(.+)
,则nil
将为第三行返回scan
。 正则表达式中的第一个?
使.+
“非贪婪”。没有它,返回的数组的最后一个元素是:
“Haskell编程语言与F *”