我遇到了问题。我有一个xml文档,我需要将它放入R中的data.frame中。
到目前为止,我设法使用包xml
和plyr
将一个简单的xml上传到data.frame并执行
dataframe=ldply(xmlToList("file.xml"), data.frame)
但是当我运行这个xml时:
<BusinessUnitList>
<BusinessUnit id="000000195">
<User id="897654322" firstName="Rick" lastName="Test" middleName="R" defaultLanguageName="English">
<RoleList>
<Role id="worker"/>
</RoleList>
<OrgList>
<Organization id="1111"/>
</OrgList>
<Address country="Italy"/>
<Employee badgeNumber="575757" Date="2017-01-01" DateNew="2017-01-02" birthDate="1999-01-01">
<Availability val1="5" val2="n" val3="6" HoursPerWeek="33.75" HoursBetweenShifts="10" minHoursPerWeek="00.00"/>
</Employee>
</User>
</BusinessUnit>
<BusinessUnit id="000000111">
<User id="897652222" firstName="TERI" lastName="tst2" middleName="D" defaultLanguageName="English">
<RoleList>
<Role id="worker"/>
</RoleList>
<OrgList>
<Organization id="2222"/>
</OrgList>
<Address country="Portugal"/>
<Employee badgeNumber="575757" Date="2017-02-02" DateNew="2017-02-02" birthDate="1998-01-01">
<Availability val1="5" val2="n" val3="6" HoursPerWeek="33.75" HoursBetweenShifts="10" minHoursPerWeek="00.00"/>
</Employee>
</User>
</BusinessUnit>
</BusinessUnitList>
我收到错误:Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 9, 7.
答案 0 :(得分:0)
您正在尝试合并这样的列表
list(a=1:2, b=3:5)
$a
[1] 1 2
$b
[1] 3 4 5
data.frame( list(a=1:2, b=3:5))
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 2, 3
我可能会取消列出xmlToList
结果并格式化列名。
doc <- xmlParse("file.xml")
x <- data.frame( t( unlist(xmlToList(doc))) )
names(x) <- gsub("(..attrs)?.id$", "_id", names(x))
names(x) <- gsub(".*\\.", "", names(x))
Role_id Organization_id country val1 val2 val3 HoursPerWeek HoursBetweenShifts minHoursPerWeek badgeNumber Date DateNew birthDate User_id firstName lastName middleName defaultLanguageName BusinessUnit_id
1 worker 1111 Italy 5 n 6 33.75 10 00.00 575757 2017-01-01 2017-01-02 1999-01-01 897654322 Rick Test R English 000000195