如何从字符串中删除特殊字符和多个空格

时间:2018-09-12 05:42:32

标签: ruby

我想从字符串的开头和结尾删除所有特殊字符(包括空格),并用一个空格替换连续的空格。例如,

"      !:;:§"   this string is normal.   "§$"§"$"§$    $"$§"     "

应成为:

"this string is normal"

我想在字符串的末尾允许!?

"      !:;:§"   this  string is normal?   "§$"§"$"§$    $"$§"      "
"      !:;:§"   this string    is very normal!   "§$"§"$"§$    $"$§"      "
"      !:;:§"   this string is     very normal!?   "§$"§"$"§$    $"$§"      "

应成为:

"this string is normal?"
"this string is normal!"
"this string is normal!?"

这都是为了在应用中获得漂亮的标题。

有人可以帮我吗?还是有人知道一个很好的正则表达式命令来显示漂亮的标题?

2 个答案:

答案 0 :(得分:2)

逐步执行:

str.
  gsub(/\A\W+/, ''). # remove garbage from the very beginning
  gsub(/\W*\z/) { |m| m[/\A\p{Punct}*/] }. # leave trailing punctuation
  gsub(/\s{2,}/, ' ') # squeeze

答案 1 :(得分:2)

R = /
    (?:           # begin a non-capture group
      \p{Alnum}+  # match one or more alphanumeric characters
      [ ]+        # match one or more spaces
    )*            # end non-capture group and execute zero or more times
    \p{Alnum}+    # match one or more alphanumeric characters
    [!?]*         # match zero or more characters '!' and '?'
    /x            # free-spacing regex definition mode

def extract(str)
  str[R].squeeze(' ')
end

arr = [
  '      !:;:§"   this  string is normal?   "§$"§"$"§$    $"$§"      ',
  '      !:;:§"   this string    is very normal!   "§$"§"$"§$    $"$§"      ',
  '      !:;:§"   this string is     very normal!?   "§$"§"$"§$    $"$§"      ',
  '      !:;:§"   cette  chaîne  est normale?   "§$"§"$"§$    $"$§"    '
]
arr.each { |s| puts extract(s) }

打印

this string is normal?
this string is very normal!
this string is very normal!?
cette chaîne est normale?

请参阅Regexp\p{Alnum}的文档(搜索“ \ p {}结构”)。

我以自由间距模式编写了正则表达式,以便记录每个步骤。按照惯例,其编写如下:

/(?:\p{Alnum}+ +)*\p{Alnum}+[!?]*/

请注意,在自由间距模式下,我在字符类中放置了一个空格。如果我没有这样做,那么在对正则表达式求值之前,该空间将被删除。

如果在字符串的内部允许使用非字母数字字符(空格除外),则将正则表达式更改为以下内容。

def extract(str)
  str.gsub(R,'')
end

R = /
    \A              # match the beginning of the string
    [^\p{Alnum}]+   # match one non-alphanumeric characters
    |               # or
    [^\p{Alnum}!?]  # match a character other than a alphanumeric, '!' and '?'
    [^\p{Alnum}]+   # match one non-alphanumeric characters
    \z              # match the end of the string
    |               # or
    [ ]             # match a space...
    (?=[ ])         # ...followed by a space
    /x              # free-spacing regex definition mode

extract '  !:;:§"   this  string $$ is abnormal?   "§$"  $"$§"  '

打印

"this string $$ is abnormal?"