如何将任何XML文件转换为数据框?

时间:2018-04-11 21:10:37

标签: r

我将一个XML文件导入到R.我对数据进行了一些解析,因为它非常混乱。现在,我有一个稍微清洁的数据集,称为'doc',我试图导出它,但我遇到了各种各样的错误。

我试过了:

library(rvest)
df<-xmlToDataFrame(nodes = getNodeSet(doc, "//outputColumn"))

我明白了:

Error in UseMethod("xpathApply") : 
  no applicable method for 'xpathApply' applied to an object of class "c('xml_document', 'xml_node')"

我试过了:

write.csv(doc, file = "C:\\path_here\\MyData.csv")

我明白了:

Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) : 
  cannot coerce class "c("xml_document", "xml_node")" to a data.frame

我试过了:

write.table(doc, "C:\\path_here\\filename.txt, sep="\t", col.names=F)

我明白了:

Error: unexpected input in "write.table(doc, "C:\\path_here\\filename.txt, sep="\"

有没有办法简化这个,所以我可以把所有东西都放到DataFrame中,然后导出它?或者,只需将所有内容转储到文本文件中。那也没关系。感谢。

'outputColumn refId =' 'DTS:REFID =' '/ outputColumns'

以下是XML示例:

truncationRowDisposition="FailComponent"/><outputColumn refId="Package\Pre RawData\108_DataPull\108_1T2_Multi_MG_FUTURE\108_MULTI_MG_FUTURE.Outputs[OLE DB Source Output].Columns[DEF_TAX_AMT]" dataType="i4" errorOrTruncationOperation="Conversion" errorRowDisposition="FailComponent" externalMetadataColumnId="Package\Pre RawData\108_DataPull\108_1T2_Multi_MG_FUTURE\108_MULTI_MG_FUTURE.Outputs[OLE DB Source Output].ExternalColumns[DEF_TAX_AMT]" lineageId="Package\Pre RawData\108_DataPull\108_1T2_Multi_MG_FUTURE\108_MULTI_MG_FUTURE.Outputs[OLE DB Source Output].Columns[DEF_TAX_AMT]" name="DEF_TAX_AMT" truncationRowDisposition="FailComponent"/><outputColumn refId="Package\Pre RawData\108_DataPull\108_1T2_Multi_MG_FUTURE\108_MULTI_MG_FUTURE.Outputs[OLE DB Source Output].Columns[OTS_BALANCE]" dataType="numeric" errorOrTruncationOperation="Conversion" errorRowDisposition="FailComponent" externalMetadataColumnId="Package\Pre RawData\108_DataPull\108_1T2_Multi_MG_FUTURE\108_MULTI_MG_FUTURE.Outputs[OLE DB Source Output].ExternalColumns[OTS_BALANCE]" lineageId="Package\Pre RawData\108_DataPull\108_1T2_Multi_MG_FUTURE\108_MULTI_MG_FUTURE.Outputs[OLE DB Source Output].Columns[OTS_BALANCE]" name="OTS_BALANCE" precision="19" scale="2" truncationRowDisposition="FailComponent"/>

我想提出这一部分:

refId="Package\Pre RawData\108_DataPull\108_1T2_Multi_MG_FUTURE\108_MULTI_MG_FUTURE.Outputs[OLE DB Source Output].Columns[DEF_TAX_AMT]"

如果我可以将带有'outputColumn refId ='的行上的引号之间的所有内容聚合到列表中,并将列表导出到文件(CSV,TXT或其他),那就太棒了。同样,如果我可以将'DTS:refId ='聚合到一个列表中并导出它,那就太棒了。

0 个答案:

没有答案