我将一个XML文件导入到R.我对数据进行了一些解析,因为它非常混乱。现在,我有一个稍微清洁的数据集,称为'doc',我试图导出它,但我遇到了各种各样的错误。
我试过了:
library(rvest)
df<-xmlToDataFrame(nodes = getNodeSet(doc, "//outputColumn"))
我明白了:
Error in UseMethod("xpathApply") :
no applicable method for 'xpathApply' applied to an object of class "c('xml_document', 'xml_node')"
我试过了:
write.csv(doc, file = "C:\\path_here\\MyData.csv")
我明白了:
Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) :
cannot coerce class "c("xml_document", "xml_node")" to a data.frame
我试过了:
write.table(doc, "C:\\path_here\\filename.txt, sep="\t", col.names=F)
我明白了:
Error: unexpected input in "write.table(doc, "C:\\path_here\\filename.txt, sep="\"
有没有办法简化这个,所以我可以把所有东西都放到DataFrame中,然后导出它?或者,只需将所有内容转储到文本文件中。那也没关系。感谢。
'outputColumn refId =' 'DTS:REFID =' '/ outputColumns'
以下是XML示例:
truncationRowDisposition="FailComponent"/><outputColumn refId="Package\Pre RawData\108_DataPull\108_1T2_Multi_MG_FUTURE\108_MULTI_MG_FUTURE.Outputs[OLE DB Source Output].Columns[DEF_TAX_AMT]" dataType="i4" errorOrTruncationOperation="Conversion" errorRowDisposition="FailComponent" externalMetadataColumnId="Package\Pre RawData\108_DataPull\108_1T2_Multi_MG_FUTURE\108_MULTI_MG_FUTURE.Outputs[OLE DB Source Output].ExternalColumns[DEF_TAX_AMT]" lineageId="Package\Pre RawData\108_DataPull\108_1T2_Multi_MG_FUTURE\108_MULTI_MG_FUTURE.Outputs[OLE DB Source Output].Columns[DEF_TAX_AMT]" name="DEF_TAX_AMT" truncationRowDisposition="FailComponent"/><outputColumn refId="Package\Pre RawData\108_DataPull\108_1T2_Multi_MG_FUTURE\108_MULTI_MG_FUTURE.Outputs[OLE DB Source Output].Columns[OTS_BALANCE]" dataType="numeric" errorOrTruncationOperation="Conversion" errorRowDisposition="FailComponent" externalMetadataColumnId="Package\Pre RawData\108_DataPull\108_1T2_Multi_MG_FUTURE\108_MULTI_MG_FUTURE.Outputs[OLE DB Source Output].ExternalColumns[OTS_BALANCE]" lineageId="Package\Pre RawData\108_DataPull\108_1T2_Multi_MG_FUTURE\108_MULTI_MG_FUTURE.Outputs[OLE DB Source Output].Columns[OTS_BALANCE]" name="OTS_BALANCE" precision="19" scale="2" truncationRowDisposition="FailComponent"/>
我想提出这一部分:
refId="Package\Pre RawData\108_DataPull\108_1T2_Multi_MG_FUTURE\108_MULTI_MG_FUTURE.Outputs[OLE DB Source Output].Columns[DEF_TAX_AMT]"
如果我可以将带有'outputColumn refId ='的行上的引号之间的所有内容聚合到列表中,并将列表导出到文件(CSV,TXT或其他),那就太棒了。同样,如果我可以将'DTS:refId ='聚合到一个列表中并导出它,那就太棒了。