Nokogiri - XML编码问题

时间:2016-12-04 20:53:32

标签: ruby xml character-encoding nokogiri

我写了一个简单的Ruby脚本,与Google搜索的建议API进行了对话。

通过更改“查询”变量,您可以定义要求API的内容。适用于英语,但德语变音符号似乎会导致一些编码问题。在下面的例子中,我使用了“Tür”(门)这个词来证明这个问题。

#!/usr/bin/env ruby
# encoding: UTF-8

require 'nokogiri'
require 'open-uri'

query = 'Tür'
uri = URI.encode("http://suggestqueries.google.com/complete/search?output=toolbar&hl=de&q=#{query}")
puts uri
puts '----------'

xml_doc = Nokogiri::XML(open(uri)) 
puts xml_doc
puts '----------'

xml_doc.xpath('.//suggestion').each do |suggestion| 
  puts suggestion.attr('data')
end

输出:

http://suggestqueries.google.com/complete/search?output=toolbar&hl=de&q=T%C3%BCr
----------
element suggestion: output error : invalid character value
<?xml version="1.0"?>
<toplevel>
  <CompleteSuggestion>
    <suggestion data="t&#xFC;rkei"/>
  </CompleteSuggestion>
  <CompleteSuggestion>
    <suggestion data="t?rkis"/>
  </CompleteSuggestion>
  <CompleteSuggestion>
    <suggestion data="t?rkei news"/>
  </CompleteSuggestion>
  <CompleteSuggestion>
    <suggestion data="t?rkiye"/>
  </CompleteSuggestion>
  <CompleteSuggestion>
    <suggestion data="t?ren"/>
  </CompleteSuggestion>
  <CompleteSuggestion>
    <suggestion data="t?rstopper"/>
  </CompleteSuggestion>
  <CompleteSuggestion>
    <suggestion data="t?rschloss"/>
  </CompleteSuggestion>
  <CompleteSuggestion>
    <suggestion data="t?rkisch deutsch"/>
  </CompleteSuggestion>
  <CompleteSuggestion>
    <suggestion data="t?renheld"/>
  </CompleteSuggestion>
  <CompleteSuggestion>
    <suggestion data="t?rkisch"/>
  </CompleteSuggestion>
</toplevel>
----------
t?rkei
t?rkis
t?rkei news
t?rkiye
t?ren
t?rstopper
t?rschloss
t?rkisch deutsch
t?renheld
t?rkisch

如您所见,uri有效,API返回XML数据。但是打印的数据已经存在这些编码错误,我怀疑Nokogiri配置错误,因为它在Chrome中运行良好。它也说:

  

元素建议:输出错误:字符值无效

有谁知道如何解决这个问题?太棒了!

1 个答案:

答案 0 :(得分:0)

试试这个:

xml_doc = open(url) { |io| Nokogiri::XML(io.read.encode('UTF-8')) }