Question

> [x[1].txt,x[0].txt]
[
    [0] "Put your weight on to the shoulders and upper back.",
    [1] "Put your weight on to the shoulders and upper back."
]
> [x[1].txt,x[0].txt].map &:class
[
    [0] String < Object,
    [1] String < Object
]
> x[1].txt == x[0].txt
false

怎么可能呢？

更新

读了一下后我发现了这个：

y = x.map{|z| z.txt.toutf8 }
[
    [0] "Put your weight on to the shoulders and upper back.",
    [1] "窶ィPut your weight on to the shoulders and upper back.",
    [2] "窶ィPut your weight on to the shoulders and upper back."
]

所以字符串不一样，但没有.toutf8它看起来完全一样，是什么原因？

最重要的是，如何去除这些字符？

Answer 1

字符串可能是不同的编码。要找出字符串的编码，请尝试以下方法：

[x[1].txt.encoding,x[0].txt.encoding]

如果结果是这种情况，则可能是来自界面（例如View，REST API端点或文件源）的问题，或者它可能是数据库的存储/转换问题。 / p>

如果您的字符串编码不匹配，您可以执行以下操作：

x.map {|text| text.encode!("UTF-8", invalid: :replace, undef: :replace).force_encoding("utf-8") }

如果您的编码已匹配，则可以使用此gsub调用从字符串中删除这些非ASCII字符：

x.map {|text| text.gsub!(/[^\001-\176]+/, "") }

完成此操作后，您将获得以下信息：

[
  [0] "Put your weight on to the shoulders and upper back.", 
  [1] "Put your weight on to the shoulders and upper back.", 
  [2] "Put your weight on to the shoulders and upper back."
]

正则表达式将删除ASCII代码1（八进制001）和ASCII代码126（八进制176）之间的任何字符。这有效地擦除了任何非ASCII字符（和ASCII 0）的字符串。

如果您需要“扩展ASCII”用于国际字符集，例如ISO-8859字符集或Windows 1252，甚至特定的Unicode字符，您可以通过更改要包括的数字来扩展范围以包括这些字符那些人物。

Ruby比较两个相同的字符串返回false

1 个答案: