> [x[1].txt,x[0].txt]
[
[0] "Put your weight on to the shoulders and upper back.",
[1] "Put your weight on to the shoulders and upper back."
]
> [x[1].txt,x[0].txt].map &:class
[
[0] String < Object,
[1] String < Object
]
> x[1].txt == x[0].txt
false
怎么可能呢?
更新
读了一下后我发现了这个:
y = x.map{|z| z.txt.toutf8 }
[
[0] "Put your weight on to the shoulders and upper back.",
[1] "窶ィPut your weight on to the shoulders and upper back.",
[2] "窶ィPut your weight on to the shoulders and upper back."
]
所以字符串不一样,但没有.toutf8它看起来完全一样,是什么原因?
最重要的是,如何去除这些字符?
答案 0 :(得分:0)
字符串可能是不同的编码。要找出字符串的编码,请尝试以下方法:
[x[1].txt.encoding,x[0].txt.encoding]
如果结果是这种情况,则可能是来自界面(例如View,REST API端点或文件源)的问题,或者它可能是数据库的存储/转换问题。 / p>
如果您的字符串编码不匹配,您可以执行以下操作:
x.map {|text| text.encode!("UTF-8", invalid: :replace, undef: :replace).force_encoding("utf-8") }
如果您的编码已匹配,则可以使用此gsub
调用从字符串中删除这些非ASCII字符:
x.map {|text| text.gsub!(/[^\001-\176]+/, "") }
完成此操作后,您将获得以下信息:
[
[0] "Put your weight on to the shoulders and upper back.",
[1] "Put your weight on to the shoulders and upper back.",
[2] "Put your weight on to the shoulders and upper back."
]
正则表达式将删除ASCII代码1(八进制001)和ASCII代码126(八进制176)之间的任何字符。这有效地擦除了任何非ASCII字符(和ASCII 0)的字符串。
如果您需要“扩展ASCII”用于国际字符集,例如ISO-8859字符集或Windows 1252,甚至特定的Unicode字符,您可以通过更改要包括的数字来扩展范围以包括这些字符那些人物。