我有这样的XML:
<SoccerFeed timestamp="20181123T153249+0000">
<SoccerDocument season_name="Season 2016/2017" season_id="2016" competition_name="French Ligue 1" competition_id="24" competition_code="FR_L1" Type="SQUADS Latest">
<Team web_address="www.angers-sco.fr" uID="t2128" short_club_name="Angers" region_name="Europe" region_id="17" country_iso="FR" country_id="8" country="France">
<Founded>1919</Founded>
<Name>Angers</Name>
<Player uID="p40511">
<Name>Denis Petric</Name>
<Position>Goalkeeper</Position>
<Stat Type="first_name">Denis</Stat>
<Stat Type="last_name">Petric</Stat>
<Stat Type="birth_date">1988-05-24</Stat>
<Stat Type="weight">83</Stat>
<Stat Type="height">187</Stat>
<Stat Type="jersey_num">1</Stat>
<Stat Type="real_position">Goalkeeper</Stat>
<Stat Type="real_position_side">Unknown</Stat>
<Stat Type="join_date">2016-01-02</Stat>
<Stat Type="country">Slovenia</Stat>
</Player>
<Player uID="p119744">
<Name>Mathieu Michel</Name>
<Position>Goalkeeper</Position>
<Stat Type="first_name">Mathieu</Stat>
<Stat Type="last_name">Michel</Stat>
<Stat Type="birth_date">1991-09-04</Stat>
<Stat Type="birth_place">Nîmes</Stat>
<Stat Type="first_nationality">France</Stat>
<Stat Type="preferred_foot">Right</Stat>
<Stat Type="weight">84</Stat>
<Stat Type="height">189</Stat>
<Stat Type="jersey_num">1</Stat>
<Stat Type="real_position">Goalkeeper</Stat>
<Stat Type="real_position_side">Unknown</Stat>
<Stat Type="join_date">2016-08-18</Stat>
<Stat Type="country">France</Stat>
</Player>
到目前为止,我运行了以下代码:
library(tidyverse)
library(xml2)
x <- read_xml('player.xml')
Players3 <- x %>%
xml_find_all('//Player') %>%
map_df(~flatten(c(xml_attrs(.x),
map(xml_children(.x),
~set_names(as.list(xml_text(.x)), xml_name(.x)))))) %>%
type_convert()
但是通过Player_id,我只有姓名,职位,贷款和唯一一项统计信息。
我之所以陷入困境,是因为每个球员我多次获得相同的节点名称。我想从该XML文件中获取带有stat节点类型的数据框。
类似:
uID |姓名|职位| first_name | last_name |生日重量高度jersey_num | real_position | real_position_side |加入日期|国家|贷款
另外,如果我还可以拥有诸如Team uID和short_club_name之类的父节点信息,那就太好了
答案 0 :(得分:1)
这里是尝试的解决方案。请参阅注释以获取有关过程步骤的说明:
library(xml2)
library(dplyr)
x <- read_xml('player.xml')
Players3 <- x %>% xml_find_all('//Player')
dfs<-lapply(Players3, function(node){
#find names of all children nodes
childnodes<-node %>% xml_children() %>% xml_name()
#find the attr value from all child nodes
names<-node %>% xml_children() %>% xml_attr("Type")
#create columns names based on either node name or attr value
names<-ifelse(is.na(names), childnodes, names)
#find all values
values<-node %>% xml_children() %>% xml_text()
#create data frame and properly label the columns
df<-data.frame(t(values), stringsAsFactors = FALSE)
names(df)<-names
df
})
#bind together and add uid to final dataframe.
answer<-bind_rows(dfs)
answer$UID<- Players3 %>% xml_attr("uID")
answer
# Name Position first_name last_name birth_date weight height jersey_num real_position
# 1 Denis Petric Goalkeeper Denis Petric 1988-05-24 83 187 1 Goalkeeper
# 2 Mathieu Michel Goalkeeper Mathieu Michel 1991-09-04 84 189 1 Goalkeeper
# real_position_side join_date country birth_place first_nationality preferred_foot UID
# 1 Unknown 2016-01-02 Slovenia <NA> <NA> <NA> p40511
# 2 Unknown 2016-08-18 France Nimes France Right p119744