你好勇敢的编码员,我有大量的xml文件,我需要拼合,合并并转换为csv文件导入Excel,然后数据将被映射到另一个数据集,用于更大的数据迁移。
我设法生成一个脚本,该脚本收集给定文件夹中的所有xml文件,并将第一级元素输出到收集的csv文件中的新行。 问题是xml文件的结构(无法更改)。 以下是XML的示例:
<!-- language: lang-xml -->
<?xml version='1.0' encoding='UTF-8'?>
<tmf_study_item>
<a_acl_id type="String">0021AC7A0000000000081FDC</a_acl_id>
<a_created_by type="String">xxxxxxxx</a_created_by>
<ad_document_date type="LocalDate">2016-04-07</ad_document_date>
<documents>
<tmf_document>
<a_acl_id type="String">0021AC7A00000000000823B1</a_acl_id>
<a_acl_name type="String">TMF Study AC-064A201-Lupus Document Final ACL</a_acl_name>
<a_modified_date type="Date">2016-04-19 05:28:06.708</a_modified_date>
<multi_index_data>
<multi_index_data>
<amendment_number type="String"></amendment_number>
<artifact_num type="String">01.05.03</artifact_num>
<committee_type_code type="String"/>
</multi_index_data>
</multi_index_data>
<related_placeholders>
</related_placeholders>
<shared_user>
</shared_user>
<workflow_user>
</workflow_user>
<contents>
<content>
<path>very_long_document_title1.pdf</path>
<fileName>very_long_document_title1.pdf</fileName>
<contentTypeId>pdf</contentTypeId>
<mimeType>application/pdf</mimeType>
</content>
<content>
<path>very_long_document_title1.docx</path>
<fileName>very_long_document_title1.docx</fileName>
<contentTypeId>word_docx</contentTypeId>
<mimeType>application/vnd.openxmlformats-officedocument.wordprocessingml.document</mimeType>
</content>
</contents>
</tmf_document>
</documents>
</tmf_study_item>
你会注意到一些最后一个元素在某些文件中出现两次,可能会有更多。 所以我需要知道的是如何展平这个xml层次结构并为子元素提供唯一的名称,最好是以。的形式 [parent.element] [迭代] [是childElement] 同时考虑到子元素的数量可能会有所不同,它应该可以输出到csv。 子项目的唯一标头是必要的,以便在excel中正确地进行映射。
这是我到目前为止编写的代码,我笨拙地尝试做的是首先处理第一级元素,然后处理&#34; tmf_document&#34;之后的元素,加入它们然后导出为CSV。但由于某种原因,我无法弄清楚我得到的错误: &#34; Add-Member:无法添加名称[基本上所有元素]的成员,因为已存在具有该名称的成员。&#34; 代码:
# Get all XML files
$rootElement = "tmf_study_item"
$documentElement = "tmf_document"
$midElement = "multi_index_data"
$items = Get-ChildItem *.xml
$scriptPath = $(get-location).Path
$scriptFolder = split-path $(get-location).Path -Leaf
$outputFile = $scriptPath+"\"+$scriptFolder+".csv"
# Loop through xmls and append them to the document
foreach ($item in $items) {
# Create filename for single CSV
$baseNameoutputFile = $scriptPath+"\"+(Get-ChildItem $item).BaseName+".csv"
[XML]$xml = (Get-Content $item) #load xml document
[Array]$RootitemConverted = $xml.GetElementsByTagName($rootElement)
# Create array for document elements
[Array]$DocConverted = $xml.GetElementsByTagName($documentElement)
# Create array for multi_index_data
[Array]$MidConverted = $xml.GetElementsByTagName($midElement)
$Collection = @()
Write-host "Start processing new XML >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>"
# Loop over 1st level elements in XML------------------------------------------------------------
ForEach($Record in $RootitemConverted){
$Output = new-object psobject
$Record.selectnodes("*")|%{
Add-Member -InputObject $Output -MemberType NoteProperty -Name $_.Name -Value $_.'#text'
}
Write-host "Add data to PSobject and collection >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>"
If($Collection){
$T2Keys = $Collection|gm|?{$_.MemberType -match "Property"}|Select -ExpandProperty Name
$T1Keys = $Output|gm|?{$_.MemberType -match "Property"}|Select -ExpandProperty Name
$KeysToAdd = $T2Keys|?{$T1Keys -notcontains $_}
$KeysToAdd|%{$Collection|Add-Member $_ ""}
}
$Collection += $Output
}
# Loop over documents level elements in XML-----------------------------------------------------------------
ForEach($Documents in $DocConverted){
$DocOutput = new-object psobject
#Add Prefix to document-elements
$Documents.selectnodes("*")|%{
$tmpName = $_.Name
$tmpName = $documentElement+"_1_"+$tmpName
Write-Host $tmpName
Add-Member -InputObject $DocOutput -MemberType NoteProperty -Name $tmpName -Value $_.'#text'
}
Write-host "Add data to PSobject and collection >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>"
If($Collection){
$T2Keys = $Collection|gm|?{$_.MemberType -match "Property"}|Select -ExpandProperty Name
$T1Keys = $DocOutput|gm|?{$_.MemberType -match "Property"}|Select -ExpandProperty Name
$KeysToAdd = $T2Keys|?{$T1Keys -notcontains $_}
$KeysToAdd|%{$Collection|Add-Member $_ ""}
}
$Collection += $DocOutput
}
$Collection
# Append to CSV File
$Collection | Export-Csv -path $outputFile -Delimiter ";" -NoTypeInformation -Append
# Create a CSV for each file
#$Collection | Export-Csv -path $baseNameoutputFile -Delimiter ";" -NoTypeInformation
}
我希望你们中的一些人可以就如何解决这个问题给我一些指示。