我有一个XML文件,其中包含:
<?xml version="1.0" encoding="UTF-8" ?>
<Repository xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<DECLARE>
<PhysicalColumn name="Department" parentName=""Sample App Lite Data"..."D20 Offices"" parentId="3001:129" parentUid="80ca6538-0bb9-0000-714b-e31d00000000" id="3003:484" uid="80ca6539-0bbb-0000-714b-e31d00000000" dataType="VARCHAR" precision="20" extName="//Table/SAMP_OFFICES_D/DEPARTMENT" specialType="none">
<SourceColumn>
<RefPhysicalColumn id="3003:427" uid="80ca64f9-0bbb-0000-714b-e31d00000000" qualifiedName=""Sample App Lite Data"..."SAMP_OFFICES_D"."Department""/>
</SourceColumn>
</PhysicalColumn>
<LogicalTable name="D2 Offices" parentName=""SampleApp Lite"" parentId="2000:42377" parentUid="80cb6802-07d0-0000-714b-e31d00000000" id="2035:42562" uid="80cb68bb-07f3-0000-714b-e31d00000000" x="938" y="669">
<Description><![CDATA[This logical table maps to the physical Office Dimension table with various attributes.]]></Description>
<Columns>
<RefLogicalColumn id="2006:42563" uid="80cb68bc-07d6-0000-714b-e31d00000000" qualifiedName=""SampleApp Lite"."D2 Offices"."Office""/>
<RefLogicalColumn id="2006:42564" uid="80cb68bd-07d6-0000-714b-e31d00000000" qualifiedName=""SampleApp Lite"."D2 Offices"."Office Key""/>
<RefLogicalColumn id="2006:42565" uid="80cb68be-07d6-0000-714b-e31d00000000" qualifiedName=""SampleApp Lite"."D2 Offices"."Department""/>
<RefLogicalColumn id="2006:42566" uid="80cb68bf-07d6-0000-714b-e31d00000000" qualifiedName=""SampleApp Lite"."D2 Offices"."Dept Key""/>
<RefLogicalColumn id="2006:42567" uid="80cb68c0-07d6-0000-714b-e31d00000000" qualifiedName=""SampleApp Lite"."D2 Offices"."Organization""/>
<RefLogicalColumn id="2006:42568" uid="80cb68c1-07d6-0000-714b-e31d00000000" qualifiedName=""SampleApp Lite"."D2 Offices"."Org Key""/>
<RefLogicalColumn id="2006:42569" uid="80cb68c2-07d6-0000-714b-e31d00000000" qualifiedName=""SampleApp Lite"."D2 Offices"."Company""/>
<RefLogicalColumn id="2006:42570" uid="80cb68c3-07d6-0000-714b-e31d00000000" qualifiedName=""SampleApp Lite"."D2 Offices"."Company Key""/>
<RefLogicalColumn id="2006:42571" uid="80cb68c4-07d6-0000-714b-e31d00000000" qualifiedName=""SampleApp Lite"."D2 Offices"."Office Sequence""/>
</Columns>
<TableSources>
<RefLogicalTableSource id="2037:43058" uid="80cb6a2c-07f5-0000-714b-e31d00000000" qualifiedName=""SampleApp Lite"."D2 Offices"."LTS1 Offices""/>
</TableSources>
</LogicalTable>
<LogicalTableSource name="LTS1 Offices" parentName=""SampleApp Lite"."D2 Offices"" parentId="2035:42562" parentUid="80cb68bb-07f3-0000-714b-e31d00000000" id="2037:43058" uid="80cb6a2c-07f5-0000-714b-e31d00000000" isActive="true">
<Link>
<StartNode>
<RefPhysicalTable id="3001:129" uid="80ca6538-0bb9-0000-714b-e31d00000000" qualifiedName=""Sample App Lite Data"..."D20 Offices""/>
</StartNode>
</Link>
<WhereClause>
<Expr></Expr>
</WhereClause>
<GroupBy>
<Expr><![CDATA[ GROUPBYLEVEL("SampleApp Lite"."H2 Offices"."Offices Detail")]]></Expr>
</GroupBy>
<FragmentContent>
<Expr></Expr>
</FragmentContent>
</LogicalTableSource>
<PresentationColumn name="Department" parentName=""Sample Targets Lite".."Offices"" parentId="4008:43412" parentUid="80cb6c16-0fa8-0000-714b-e31d00000000" id="4010:43649" uid="80cb6d77-0faa-0000-714b-e31d00000000" hasDispName="false" hasDispDescription="false" overrideLogicalName="false">
<Description><![CDATA[Returns the Department description from the Office dimension. Naturally drills into Office Column.]]></Description>
<RefLogicalColumn id="2006:42565" uid="80cb68be-07d6-0000-714b-e31d00000000" qualifiedName=""SampleApp Lite"."D2 Offices"."Department""/>
</PresentationColumn>
</DECLARE>
</Repository>
从那里我需要找到Presentation Column的源,它是使用不同ID的物理列名称和物理表。 例如,我有PresentationColumn name = Department的RefLogicalColumn id =“ 2006:42565”。
<**PresentationColumn name="Department"** parentName=""Sample Targets Lite".."Offices"" parentId="4008:43412" parentUid="80cb6c16-0fa8-0000-714b-e31d00000000" id="4010:43649" uid="80cb6d77-0faa-0000-714b-e31d00000000" hasDispName="false" hasDispDescription="false" overrideLogicalName="false">
<Description><![CDATA[Returns the Department description from the Office dimension. Naturally drills into Office Column.]]></Description>
<**RefLogicalColumn id="2006:42565"** uid="80cb68be-07d6-0000-714b-e31d00000000" qualifiedName=""SampleApp Lite"."D2 Offices"."Department""/>
</PresentationColumn>
通过使用RefLogicalColumn id =“ 2006:42565”,我们将使用RefLogicalColumn id在LogicalTable中进行搜索。
<LogicalTable name="D2 Offices" parentName=""SampleApp Lite"" parentId="2000:42377" parentUid="80cb6802-07d0-0000-714b-e31d00000000" id="2035:42562" uid="80cb68bb-07f3-0000-714b-e31d00000000" x="938" y="669">
<Description><![CDATA[This logical table maps to the physical Office Dimension table with various attributes.]]></Description>
<Columns>
<RefLogicalColumn id="2006:42563" uid="80cb68bc-07d6-0000-714b-e31d00000000" qualifiedName=""SampleApp Lite"."D2 Offices"."Office""/>
<RefLogicalColumn id="2006:42564" uid="80cb68bd-07d6-0000-714b-e31d00000000" qualifiedName=""SampleApp Lite"."D2 Offices"."Office Key""/>
<**RefLogicalColumn id="2006:42565"** uid="80cb68be-07d6-0000-714b-e31d00000000" qualifiedName=""SampleApp Lite"."D2 Offices"."Department""/>
<RefLogicalColumn id="2006:42566" uid="80cb68bf-07d6-0000-714b-e31d00000000" qualifiedName=""SampleApp Lite"."D2 Offices"."Dept Key""/>
<RefLogicalColumn id="2006:42567" uid="80cb68c0-07d6-0000-714b-e31d00000000" qualifiedName=""SampleApp Lite"."D2 Offices"."Organization""/>
<RefLogicalColumn id="2006:42568" uid="80cb68c1-07d6-0000-714b-e31d00000000" qualifiedName=""SampleApp Lite"."D2 Offices"."Org Key""/>
<RefLogicalColumn id="2006:42569" uid="80cb68c2-07d6-0000-714b-e31d00000000" qualifiedName=""SampleApp Lite"."D2 Offices"."Company""/>
<RefLogicalColumn id="2006:42570" uid="80cb68c3-07d6-0000-714b-e31d00000000" qualifiedName=""SampleApp Lite"."D2 Offices"."Company Key""/>
<RefLogicalColumn id="2006:42571" uid="80cb68c4-07d6-0000-714b-e31d00000000" qualifiedName=""SampleApp Lite"."D2 Offices"."Office Sequence""/>
</Columns>
<TableSources>
<**RefLogicalTableSource id="2037:43058"** uid="80cb6a2c-07f5-0000-714b-e31d00000000" qualifiedName=""SampleApp Lite"."D2 Offices"."LTS1 Offices""/>
</TableSources>
</LogicalTable>
然后使用RefLogicalTableSource id = 2037:43058,我们将使用id在LogicalTableSource中进行搜索。
<LogicalTableSource name="LTS1 Offices" parentName=""SampleApp Lite"."D2 Offices"" parentId="2035:42562" parentUid="80cb68bb-07f3-0000-714b-e31d00000000" **id="2037:43058"** uid="80cb6a2c-07f5-0000-714b-e31d00000000" isActive="true">
<Link>
<StartNode>
<**RefPhysicalTable id="3001:129"** uid="80ca6538-0bb9-0000-714b-e31d00000000" qualifiedName=""Sample App Lite Data"..."D20 Offices""/>
</StartNode>
</Link>
<WhereClause>
<Expr></Expr>
</WhereClause>
<GroupBy>
<Expr><![CDATA[ GROUPBYLEVEL("SampleApp Lite"."H2 Offices"."Offices Detail")]]></Expr>
</GroupBy>
<FragmentContent>
<Expr></Expr>
</FragmentContent>
</LogicalTableSource>
然后使用RefPhysicalTable id = 3001:129,我们将使用parentId在PhysicalColumn中进行搜索。
<PhysicalColumn name="Department" parentName=""Sample App Lite Data"..."D20 Offices"" **parentId="3001:129"** parentUid="80ca6538-0bb9-0000-714b-e31d00000000" id="3003:484" uid="80ca6539-0bbb-0000-714b-e31d00000000" dataType="VARCHAR" precision="20" extName="//Table/SAMP_OFFICES_D/DEPARTMENT" specialType="none">
<SourceColumn>
<RefPhysicalColumn id="3003:427" uid="80ca64f9-0bbb-0000-714b-e31d00000000" qualifiedName=""Sample App Lite Data"..."SAMP_OFFICES_D"."Department""/>
</SourceColumn>
</PhysicalColumn>
在这里,我们需要PhysicalColumn name =“ Department”和extName =“ // Table / SAMP_OFFICES_D / DEPARTMENT”
我的第一个问题是将xml文件转换为数据帧,第二个是回溯源。
答案 0 :(得分:0)
xml2::read_xml
将帮助您阅读。另一个会更困难,因为看起来您有3个关系表。请参阅this page,可能还请参阅this,尽管在我尝试时将其合并到一张表中使它很混乱。
library(xml2)
library(tidyverse)
dfxml <- xml2::read_xml("C:/foo/bar.xml")
mcga <- function(tbl) {
x <- colnames(tbl)
x <- tolower(x)
x <- gsub("[[:punct:][:space:]]+", "_", x)
x <- gsub("_+", "_", x)
x <- gsub("(^_|_$)", "", x)
x <- make.unique(x, sep = "_")
colnames(tbl) <- x
tbl
}
dfxlm2 <- xml_find_all(dfxml1, ".//*") %>%
map_df(~{
xml_attrs(.x) %>%
as.list()
}) %>%
mcga()
或将它们分成3个表。
LogicalTable <- xml_find_all(dfxml1, ".//LogicalTable//*") %>%
map_df(~{
xml_attrs(.x) %>%
as.list()
}) %>%
mcga()
PhysicalTable <- xml_find_all(dfxml1, ".//PhysicalColumn") %>%
map_df(~{
xml_attrs(.x) %>%
as.list()
}) %>%
mcga()
LogTable <- xml_find_all(dfxml1, ".//LogicalTableSource//*") %>%
map_df(~{
xml_attrs(.x) %>%
as.list()
}) %>%
mcga()
您想如何跟踪这些?
答案 1 :(得分:0)
我们可以修改此代码以在同一数据框中获得偶数子属性值。
LogTable <- xml_find_all(dfxml1, ".//LogicalTableSource//*") %>%
map_df(~{
xml_attrs(.x) %>%
as.list()
}) %>%
mcga()
Xml是:
<LogicalTableSource name="LTS1 Offices" parentName=""SampleApp Lite"."D2 Offices"" parentId="2035:42562" parentUid="80cb68bb-07f3-0000-714b-e31d00000000" id="2037:43058" uid="80cb6a2c-07f5-0000-714b-e31d00000000" isActive="true">
<Link>
<StartNode>
<RefPhysicalTable id="3001:129" uid="80ca6538-0bb9-0000-714b-e31d00000000" qualifiedName=""Sample App Lite Data"..."D20 Offices""/>
</StartNode>
</Link>
<WhereClause>
<Expr></Expr>
</WhereClause>
<GroupBy>
<Expr><![CDATA[ GROUPBYLEVEL("SampleApp Lite"."H2 Offices"."Offices Detail")]]></Expr>
</GroupBy>
<FragmentContent>
<Expr></Expr>
</FragmentContent>
</LogicalTableSource>
目前,我正在为LogicalTableSource设置值,但需要在同一数据帧中包含RefPhysicalTable的值。非常感谢您的帮助。