使用R为XML文件中的每个Q' uestion创建新列

时间:2016-03-24 04:47:23

标签: xml r web-scraping

Here's XML文件的摘录。我想出了如何列出所有问题,所有正确答案以及所有错误答案。这是我使用的代码:

#loads package
library(XML)
xmlfile=xmlTreeParse("cowen.xml")
class(xmlfile)
xmltop = xmlRoot(xmlfile) #gives content of root

#Gets all the Questions 
Questions = sapply(getNodeSet(xmltop,"//quiz/question/name/text"), function(x) xmlSApply(x, xmlValue))
#dataframe of questions
Q = as.data.frame(Questions)

#Gets All the corrects answers
CorrectAnswers = sapply(getNodeSet(xmltop ,"//quiz/question/answer[@fraction='100']/text"), function(x) xmlSApply(x, xmlValue))
#dataframe of correct answers
CA = as.data.frame(CorrectAnswers)

#Gets all the wrong answers (But it doesnt get it by each question)
WrongAnswers = sapply(getNodeSet(xmltop,"//quiz/question/answer[@fraction='0']/text"), function(x) xmlSApply(x, xmlValue))
#dataframe of wrong answers
WA = as.data.frame(WrongAnswers)

我想创建一个包含四列的数据集。第1列有问题,第2列有正确答案,第3-5列有错误答案。我不确定如何创建一个遍历每个节点的循环/函数,只得到错误的答案,然后用每个错误的答案创建三列。在XML文件中:
      <answer fraction="100">表示正确答案       <answer fraction="0">代表错误的答案。

1 个答案:

答案 0 :(得分:1)

我只是将函数应用于同一getNodeSet

doc <- xmlParse("file.xml")
q1 <- getNodeSet(doc, "//question[@type='multichoice']")

Q <- sapply(q1, function(x) xpathSApply(x, "./name/text", xmlValue))
CA <- sapply(q1, function(x) xpathSApply(x, "./answer[@fraction='100']/text", xmlValue))
WA <- sapply(q1, function(x) xpathSApply(x, "./answer[@fraction='0']/text", xmlValue))

data.frame(Q, CA, t(WA))