正则表达式包括单个但不包括双连字符

时间:2016-07-13 03:17:23

标签: ruby regex

我试图获得一个正则表达式,其中带有单个连字符的单词被计为一个,但这些双连字符被计为两个。目前,这就是我所拥有的:

/\b([a-zA-Z0-9’'-])+\b/

我需要做些什么来改变这项工作?

编辑: 为了澄清,我使用这个正则表达式计算单词。

实施例: 单个破折号(1个字) 双击(2个字)

我试图按照建议添加否定前瞻,但现在它在双击之前all all所有单词(link

3 个答案:

答案 0 :(得分:1)

首先,\b不是一个好的选择,连字符之前/后的位置自己匹配为\b

下面的正则表达式适用于整个字符串:(-(?!-))负向前瞻只匹配单个连字符。

/\A(['’\p{Alnum}]|(-(?!-)))+\z/

你是否仍然需要一个正则表达式,只匹配“单词”(无论它意味着什么)和一个连字符,一个应该明确指定符号,即“断字符”:

re = /(?<![\p{Alnum}'’-])((['’\p{Alnum}]|(-(?!-)))+)(?![\p{Alnum}'’-])/ 
'goo goo-bar goo--bar, goo-bar--baz'.scan(re).map &:first
#⇒ ["goo", "goo-bar"]

答案 1 :(得分:1)

假设英文字母,并且在字母数字字符的子字符串之间最多只能有一个符号(任何[’'-]),并且在开头和结尾只能有最多一个符号一个“单词”(问题中定义的“单词”)。

[’']?\b[a-zA-Z0-9]+(?:[’'-][a-zA-Z0-9]+)*\b[’']?

测试用例:

"Us and Them"’s inclusion on the album The Dark Side of the Moon
You Am I’s latest CD
The 69’ers’ drummer, Tom Callaghan (only the second apostrophe is possessive)
His ’n’ Hers’ first track is called "Joyriders".[18]
Was She's success greater, or King Solomon’s Mines's?
Rock 'n' Roll
’bout for about, ’less for unless, ’twas for it was
’70s for 1970s 
You-Know-Who
the fo’c’s’le’s timbers
Three-hundred-year-old trees are an indeterminate number of trees that are each aged 300 years.
syl-la-bi-fi-ca-tion
double--hyphen

Demo at Rubular

答案 2 :(得分:1)

根据我的理解,目标是计算单词,其中包含两个连续超时的单词将被计为两个单词。我没有试图在一个正则表达式中做所有事情,而是用空格替换了两个或多个连续连字符的所有实例,从而将单词分成两个单词,然后只计算单词。

def count_words(str)
  str.gsub(/-{2,}/, ' ').scan(/[a-zA-Z0-9’'-]+/).size
end

我将使用@ nhahtdh测试字符串的一部分进行演示。

str =<<BITTER_END
"Us and Them"’s inclusion on the album The Dark Side of the Moon
You Am I’s latest CD
The 69’ers’ drummer, Tom Callaghan (only the second apostrophe is possessive)
His ’n’ Hers’ first track is called "Joyriders".[18]
Was She's success greater, or King Solomon’s Mines's?
Rock 'n' Roll
’bout for about, ’less for unless, ’twas for it was
’70s for 1970s
BITTER_END

  #=> "\"Us and Them\"’s inclusion on the album The Dark Side of the Moon\nYou Am   I’s latest CD\nThe 69’ers’ drummer, Tom Callaghan (only the second apostrophe is possessive)\nHis ’n’ Hers’ first track is called \"Joyriders\".[18]\nWas She's success greater, or King Solomon’s Mines's?\nRock 'n' Roll\n’bout for about, ’less for unless, ’twas for it was\n’70s for 1970s\n"

count_words(str) #=> 63

@nhahtdh和@mudasobwa获得上述str的相同数量(63)。