如何解决UTF-8 ArgumentError中的无效字节序列?

时间:2011-05-31 02:20:19

标签: ruby nokogiri

我正在尝试运行以下代码,我使用nokogiri来解析xml文件。我想从文本中删除换行符 包含在标签之间。我在这里的代码曾经用于工作,但出于某种原因,现在却没有。可能是因为我 升级到ruby-1.9.1。

titles = node.search('b')
titles.each do |e|
  unless e.parent.name == "h4"
    if e.children.children.first.nil? == false
      puts e.children.children.first.text.gsub("\n","")
    end
  end
end

当我运行代码时,我收到此错误:

HI.  You're using libxml2 version 2.6.16 which is over 4 years old and has
plenty of bugs.  We suggest that for maximum HTML/XML parsing pleasure, you
upgrade your version of libxml2 and re-install nokogiri.  If you like using
libxml2 version 2.6.16, but don't like this warning, please define the constant
I_KNOW_I_AM_USING_AN_OLD_AND_BUGGY_VERSION_OF_LIBXML2 before requring nokogiri.

test.rb:35:in `gsub': invalid byte sequence in UTF-8 (ArgumentError)

1 个答案:

答案 0 :(得分:1)

您可以尝试通过RVM安装1.9.2。

curl -L https://get.rvm.io | bash
rvm install 1.9.2

如果你想将ruby默认为你的rvm 1.9.2安装,那么

rvm use 1.9.2 --default

注意:以上内容相当于:

curl -L https://get.rvm.io | bash -s -- --ruby=1.9.2