使用R从MeshHeading标记中提取属性值

时间:2017-06-23 15:25:19

标签: r xml

这是我的Pubmed文章的一部分,我试图提取彼此对应的DescriptorName UI和QualifierName UI

<MeshHeadingList>
        <MeshHeading>
          <DescriptorName UI="D000368" MajorTopicYN="N">Aged</DescriptorName>
        </MeshHeading>
        <MeshHeading>
          <DescriptorName UI="D000884" MajorTopicYN="Y">Anthropology, Cultural</DescriptorName>
        </MeshHeading>
        <MeshHeading>
          <DescriptorName UI="D005191" MajorTopicYN="Y">Family Characteristics</DescriptorName>
        </MeshHeading>
        <MeshHeading>
          <DescriptorName UI="D005260" MajorTopicYN="N">Female</DescriptorName>
        </MeshHeading>
        <MeshHeading>
          <DescriptorName UI="D005783" MajorTopicYN="N">Gender Identity</DescriptorName>
        </MeshHeading>
        <MeshHeading>
          <DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
        </MeshHeading>
        <MeshHeading>
          <DescriptorName UI="D008875" MajorTopicYN="N">Middle Aged</DescriptorName>
        </MeshHeading>
        <MeshHeading>
          <DescriptorName UI="D014930" MajorTopicYN="N">Women</DescriptorName>
          <QualifierName UI="Q000523" MajorTopicYN="Y">psychology</QualifierName>
        </MeshHeading>
</MeshHeadingList>

And i want something like this:

DescriptorName UI               QualifierNAmeUI
D000368|D000884|...|D014930      NA|NA|...|Q000523

1 个答案:

答案 0 :(得分:0)

MeSH描述符可以有0到多个限定符,因此可能不是最佳输出格式。

Marriage/ethnology
Marriage/psychology

您可以获取MeshHeading节点,然后应用可以处理丢失或多个节点的函数(在此处使用逗号连接两个或多个限定符)。

url <- "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=22577739&retmode=xml"
doc <- xmlParse(readLines(url))
mesh <- getNodeSet(doc, "//MeshHeading")

xpath2 <-function(x, ...){
    y <- xpathSApply(x, ...)
    ifelse(length(y) == 0, NA,  paste(y, collapse=","))
} 

sapply(mesh, xpath2, ".//DescriptorName", xmlValue)
[1] "Adult"                      "Attitude to Health"         "Culture"
[4] "Female"                     "Gender Identity"            "Humans"  
...

m1 <- sapply(mesh, xpath2, ".//DescriptorName", xmlGetAttr, "UI")
m2 <- sapply(mesh, xpath2, ".//QualifierName", xmlGetAttr, "UI")
m2
[1] NA                "Q000208"         NA                NA                NA                NA               
[7] "Q000523"         NA                "Q000208,Q000523" NA                NA                NA               
[13] NA                NA               

data.frame(DescriptorNameUI = paste(m1, collapse="|"),
            QualifierNAmeUI = paste(m2, collapse="|"))
            DescriptorNameUI
1 D000328|D001294|D003469|D005260|D005783|D006801|D007044|D008297|D008393|D008800|D008875|D012044|D012959|D011795
QualifierNAmeUI
1 NA|Q000208|NA|NA|NA|NA|Q000523|NA|Q000208,Q000523|NA|NA|NA|NA|NA