如何从文本中删除零宽度空格字符

时间:2018-06-01 16:56:21

标签: ruby string ruby-on-rails-4

我的文字包含‍ Zero width joiner,这在UI中不可见,但当我将其作为短信发送时,它在iPhone中显示为?问号。

我尝试过使用gsub删除它,但它没有被删除。

text.gsub("&zwj\;", "")

有没有从文本中删除这种不可见的字符?

更新

除了@ matt的回答

Unicode具有以下零宽度字符:

  • U + 200B零宽度空间
  • U + 200C零宽度非连接器Unicode代码点
  • U + 200D零宽度连接器Unicode代码点
  • U + FEFF零宽度不间断空格Unicode代码点

要在文本中替换它们,您可以使用简单的正则表达式:

text = text.gsub(/[\u200B-\u200D\uFEFF]/, '')

2 个答案:

答案 0 :(得分:1)

"blah blah blah".gsub(/[^[:print:]]/, '')

应删除所有不可打印的字符。

答案 1 :(得分:1)

The string ‍ is the HTML character entity for the zero-width joiner. When a web browser sees it it will replace it with an actual zero-width joiner, but as far as Ruby is concerned it is just a 5 character string.

What you want to do is to specify the actual zero-width joiner character. It has the codepoint U+200D, so you can use it like this, using Ruby’s Unicode escape:

text.gsub("\u200D", "")

This should remove the zero-width joiner characters, rather than looking for the string ‍ which your original code was doing.