我昨天抓了一个需要登录的网站,页面是xml格式,如下所示。我解决它时遇到了麻烦,因为有些教师属于两个部门,而且我不需要前三行因为这只意味着我成功登录。我需要把它变成一个数据框(或列表,json格式)
我的代码:
ID <- xpathApply(xml, "//teacher[@id]")
ID_unlist <- unlist(ID)
matrix <- as.data.frame(matrix(ID_unlist),nrow= 2, byrow=TRUE)
Error in prettyNum(.Internal(format(x, trim, digits, nsmall, width, 3L, :
first argument must be atomic
XML:
<result status="success">
<code>1</code>
<note>success</note>
<teacherList>
<teacher id="D95">
<name>Mary</name>
<department id="420">
<name>Math</name>
</department>
<department id="421">
<name>Statistics</name>
</department>
</teacher>
<teacher id="D73">
<name>Adam</name>
<department id="412">
<name>English</name>
</department>
</teacher>
</teacherList>
</result>
我预期的结果将是:
t_id teacher d_id department
D95 Mary 420 Math
D95 Mary 421 statistics
D73 Adam 412 English
答案 0 :(得分:2)
可能不是最有效的方式,但有效。
require(XML)
content_list <- XML::xmlToList(content)
df<-as.data.frame ( do.call(rbind,
lapply(content_list$teacherList, function(teacher) {
unname ( do.call(cbind, list ( teacher$.attrs, teacher$name, do.call(rbind, teacher[names(teacher) == "department"]) ) ) )
})
)
)
colnames(df)<-c("id","teacher","department","did")
id teacher department did
1 D95 Mary Math 420
2 D95 Mary Statistics 421
3 D73 Adam English 412