Question

我正在尝试在Ruby中使用nokogiri解析一些xliff文件并获取值（URL）。但是我发现.text方法会生成一些不正确的文本。

具体地说，我要解析的文本是：

https://help.xxx.go.com/articleView?siteLang=ja&id=000000&language=ja&type=1

但是使用.text，我得到的是：

https://help.xxx.go.com/articleView?siteLang=ja=000000=ja=1

由于某些未知原因，“＆id”，“＆language”和“＆type”都消失了。我想知道是什么原因以及如何获取正确的网址。

我尝试搜索xpath的相关问题和文档，但没有找到所需的信息。

代码在其他文本和网址（例如“ https://help.xxx.com/xxx/HTViewHelpDoc?id=abc_int_setting_up_map.htm”）上也能很好地工作因此，我怀疑这与诸如'＆'

之类的特殊字符有关我尝试解析的

xliff文件：

<trans-unit id="xyz">
        <source>https://help.xxx.go.com/articleView?id=000000&language=en_US&type=1</source>
        <target>https://help.xxx.go.com/articleView?siteLang=ja&id=000000&language=ja&type=1</target>
      </trans-unit>

我编写的红宝石代码：

doc = Nokogiri::XML(File.open(file))
    doc.remove_namespaces!
    labels = doc.xpath('//trans-unit')

    builder = Nokogiri::XML::Builder.new(:encoding => "UTF-8") do |xml|
        xml.Translations("xmlns" => "http://xxx.go.com/2000/00/metadata") {
            labels.each do |label|
                type = label.attr("type")
                id = label.attr("id")
                target = label.xpath('target').text
                source = label.xpath('source').text
                if id == "xyz"
                    puts source
                    puts target
                end

这就是我得到的：

https://help.xxx.go.com/articleView?id=000000=en_US=1
https://help.xxx.go.com/articleView?siteLang=ja=000000=ja=1

我希望我使用xpath获得的文本与原始URL相同。在此先感谢！

Xpath中的.text省略了url中的某些字符

0 个答案: