我有一个如下所示的XML文件:
w:tblStyle
我想收集所有
<dataDscr>
<var ID="V335" name="question1" files="F1" dcml="0" intrvl="discrete">
<location width="1"/>
<labl>
question 1 label
</labl>
<qstn>
<qstnLit>
question 1 literal question
</qstnLit>
<ivuInstr>
question 1 interviewer instructions
</ivuInstr>
</qstn>
</var>
<var ID="V335" name="question2" files="F1" dcml="0" intrvl="discrete">
<location width="1"/>
<labl>
question 2 label
</labl>
<qstn>
<preQTxt>
question 2 pre question text
</preQTxt>
<qstnLit>
question 2 literal question
</qstnLit>
<ivuInstr>
question 2 interviewer instructions
</ivuInstr>
</qstn>
</var>
<var ID="V335" name="question3" files="F1" dcml="0" intrvl="discrete">
<location width="1"/>
<labl>
question 3 label
</labl>
<qstn>
<preQTxt>
question 3 pre question text
</preQTxt>
<qstnLit>
question 3 literal question
</qstnLit>
</qstn>
</var>
</dataDscr>
子级的值,以及父标记<qstn>
中的name
属性(即“ question1”)。请注意,<var>
的孩子数量有所不同。例如,有<qstn>
个两个孩子,即question1
和<qstnLit>
。 <ivuInstr>
拥有question2
可以拥有的所有孩子。
我希望最终结果看起来像这样:
<qstn>
谢谢!
答案 0 :(得分:1)
这应该适合您的情况:
library(tidyverse)
library(xml2)
doc <- read_xml( "data.xml" )
# get all var elements
vars <- xml_find_all( doc, "//var" )
# extract from each "var" element the children of the "qstn" elements,
# then take the tag names and the enclosed text and put each in a column
df_long <- do.call( rbind, lapply(vars,
function(x) {
lbl <- xml_attr( x, "name" )
tags <- xml_find_all( x, "qstn/*" )
data.frame( name = lbl,
col = xml_name(tags),
txt = trimws(xml_text(tags)) )
}) )
# spread the data frame to wide format
df <- df_long %>% pivot_wider( name, names_from = col, values_from = txt )
输出:
# A tibble: 3 x 4
name qstnLit ivuInstr preQTxt
<chr> <chr> <chr> <chr>
1 question1 question 1 literal question question 1 interviewer instructions NA
2 question2 question 2 literal question question 2 interviewer instructions question 2 pre question text
3 question3 question 3 literal question NA question 3 pre question text
在这里,pivot_wider
处理不同数量的列,将NA
放在var
元素不存在的位置。