我无法从xml提取节点。 xml_find_all无法按预期工作

时间:2019-11-09 17:23:51

标签: r xml-parsing xml2

我的问题可能很简单,但是我在使用xml时遇到问题。我有一个代谢物清单和一个数据库,在这里可以以xml格式找到有关它们的信息。我正在尝试创建一个同义词表,以便将所需的代谢物名称转换为更适合下游分析的名称。这是我尝试访问同义词节点的简单代码,由于某种原因,该代码无法正常工作。我尝试了另一个成功的xml文件。另外,任何有关如何构建此表的技巧都将受到赞赏。

library(xml2)

metabolites <- read_xml('<?xml version="1.0" encoding="UTF-8"?>
                    <hmdb xmlns="http://www.hmdb.ca">
                    <metabolite>
                    <version>4.0</version>
                    <creation_date>2005-11-16 15:48:42 UTC</creation_date>
                    <update_date>2019-01-11 19:13:56 UTC</update_date>
                    <accession>HMDB0000001</accession>
                    <status>quantified</status>
                    <secondary_accessions>
                    <accession>HMDB00001</accession>
                    <accession>HMDB0004935</accession>
                    </secondary_accessions>
                    <name>1-Methylhistidine</name>
                    <cs_description>1-Methylhistidine, also known as 1-mhis...</cs_description>
                    <description>One-methylhistidine (1-MHis) is derived ...</description>
                    <synonyms>
                    <synonym>(2S)-2-amino-3-(1-Methyl-1H-imidazol-4-yl)propanoic acid</synonym>
                    <synonym>1-Methylhistidine</synonym>
                    <synonym>Pi-methylhistidine</synonym>
                    <synonym>(2S)-2-amino-3-(1-Methyl-1H-imidazol-4-yl)propanoate</synonym>
                    <synonym>1 Methylhistidine</synonym>
                    </synonyms>
                    <chemical_formula>C7H11N3O2</chemical_formula>
                    <average_molecular_weight>169.1811</average_molecular_weight>
                    </metabolite>
                    </hmdb>')


syn <- xml_find_all(metabolites, "//synonyms")

谢谢!

1 个答案:

答案 0 :(得分:2)

它与名称空间声明有关。请参阅此处的讨论:https://github.com/r-lib/xml2/issues/222

library(xml2)

metabolites <- read_xml('<hmdb xmlns="http://www.hmdb.ca">
<metabolite>
<version>4.0</version>
<creation_date>2005-11-16 15:48:42 UTC</creation_date>
<update_date>2019-01-11 19:13:56 UTC</update_date>
<accession>HMDB0000001</accession>
<status>quantified</status>
<secondary_accessions>
<accession>HMDB00001</accession>
<accession>HMDB0004935</accession>
</secondary_accessions>
<name>1-Methylhistidine</name>
<cs_description>1-Methylhistidine, also known as 1-mhis...</cs_description>
<description>One-methylhistidine (1-MHis) is derived ...</description>
<synonyms>
<synonym>(2S)-2-amino-3-(1-Methyl-1H-imidazol-4-yl)propanoic acid</synonym>
<synonym>1-Methylhistidine</synonym>
<synonym>Pi-methylhistidine</synonym>
<synonym>(2S)-2-amino-3-(1-Methyl-1H-imidazol-4-yl)propanoate</synonym>
<synonym>1 Methylhistidine</synonym>
</synonyms>
<chemical_formula>C7H11N3O2</chemical_formula>
<average_molecular_weight>169.1811</average_molecular_weight>
</metabolite>
</hmdb>')

# namespace d1
xml_ns(metabolites)
#> d1 <-> http://www.hmdb.ca
#doesn't work
xml_find_all(metabolites, "//synonyms")
#> {xml_nodeset (0)}
#works
xml_find_all(metabolites, "//d1:synonyms")
#> {xml_nodeset (1)}
#> [1] <synonyms>\n  <synonym>(2S)-2-amino-3-(1-Methyl-1H-imidazol-4-yl)pro ...

reprex package(v0.3.0)于2019-11-09创建