寻找与Ruby中的#gsub一起使用的正则表达式来删除字符串中的所有数字,除了序数。假设以下内容可以在字符串中保留我想要的内容:
string = "50th red balloon"
我如何修改strip_digits中的正则表达式,以便if:
=> "50th red balloon"
strip_digits将返回:
{{1}}
也就是说,正则表达式会忽略作为序数一部分的数字,否则会匹配它们。
对于这个例子,可以安全地假设任何数字字符串紧跟一个序数指示符(" nd"," th&#34 ;," rd"或者" st")是序数。
答案 0 :(得分:1)
正如你的正则表达式的“修复”一样,我建议:
input.gsub(/(\d+(?:th|[rn]d|st))|[^a-z\s]/i, "\\1")
逻辑如下:匹配并捕获到组1所有后跟数字后缀的数字,然后在替换模式中使用\1
反向引用恢复此值,然后匹配(删除)所有包含[^a-z\s]
(或[^\p{L}\s]
)的非字母和非空格。
模式详情:
(\d+(?:th|[rn]d|st))
- 第1组匹配1位数字(\d+
),后跟th
,rd
,nd
或st
(所有substring存储在编号的缓冲区#1中,当在替换模式中使用\1
反向引用时访问该缓冲区。|
- 或[^a-z\s]
- ASCII字母以外的字符(由于/i
不区分大小写的修饰符而匹配所有小写和大写字母)和空格(为避免删除Unicode字母,请使用{{ 1}}而不是\p{L}
)。答案 1 :(得分:0)
您可以使用word boundaries \b
,即:
strip_digits = string.gsub(/\b\d+(?!st|th|rd|nd)\b/, '')
正则表达式解释:
\b\d+(?!st|th|rd|nd)\b
Assert position at a word boundary (position preceded or followed—but not both—by a Unicode letter, digit, or underscore) «\b»
Match a single character that is a “digit” (ASCII 0–9 only) «\d+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Assert that it is impossible to match the regex below starting at this position (negative lookahead) «(?!st|th|rd|nd)»
Match this alternative (attempting the next alternative only if this one fails) «st»
Match the character string “st” literally (case sensitive) «st»
Or match this alternative (attempting the next alternative only if this one fails) «th»
Match the character string “th” literally (case sensitive) «th»
Or match this alternative (attempting the next alternative only if this one fails) «rd»
Match the character string “rd” literally (case sensitive) «rd»
Or match this alternative (the entire group fails if this one fails to match) «nd»
Match the character string “nd” literally (case sensitive) «nd»
Assert position at a word boundary (position preceded or followed—but not both—by a Unicode letter, digit, or underscore) «\b»
答案 2 :(得分:0)
你可以使用负向前瞻:(这也会折叠额外的空格)
t = "And on 3rd day, he created the 1st of his 22 books, not including the 3 that were never published - this was the 2nd time this happened."
print(t.gsub(/\s*\d+(?!st|th|rd|nd)\s*/, " "))#=>And on 3rd day, he created the 1st of his books, not including the that were never published - this was the 2nd time this happened.