Powershell脚本导入,展平和合并多个xml文件并导出为csv

时间:2018-01-23 17:06:36

标签: xml powershell csv

你好勇敢的编码员,我有大量的xml文件,我需要拼合,合并并转换为csv文件导入Excel,然后数据将被映射到另一个数据集,用于更大的数据迁移。

我设法生成一个脚本,该脚本收集给定文件夹中的所有xml文件,并将第一级元素输出到收集的csv文件中的新行。 问题是xml文件的结构(无法更改)。 以下是XML的示例:

<!-- language: lang-xml -->
<?xml version='1.0' encoding='UTF-8'?>

<tmf_study_item>
  <a_acl_id type="String">0021AC7A0000000000081FDC</a_acl_id>
  <a_created_by type="String">xxxxxxxx</a_created_by>
  <ad_document_date type="LocalDate">2016-04-07</ad_document_date>
  <documents>
    <tmf_document>
      <a_acl_id type="String">0021AC7A00000000000823B1</a_acl_id>
      <a_acl_name type="String">TMF Study AC-064A201-Lupus Document Final ACL</a_acl_name>
      <a_modified_date type="Date">2016-04-19 05:28:06.708</a_modified_date>
      <multi_index_data>
        <multi_index_data>
          <amendment_number type="String"></amendment_number>
          <artifact_num type="String">01.05.03</artifact_num>
          <committee_type_code type="String"/>
        </multi_index_data>
      </multi_index_data>
      <related_placeholders>
      </related_placeholders>
      <shared_user>
      </shared_user>
      <workflow_user>
      </workflow_user>
      <contents>
        <content>
          <path>very_long_document_title1.pdf</path>
          <fileName>very_long_document_title1.pdf</fileName>
          <contentTypeId>pdf</contentTypeId>
          <mimeType>application/pdf</mimeType>
        </content>
        <content>
          <path>very_long_document_title1.docx</path>
          <fileName>very_long_document_title1.docx</fileName>
          <contentTypeId>word_docx</contentTypeId>
          <mimeType>application/vnd.openxmlformats-officedocument.wordprocessingml.document</mimeType>
        </content>
      </contents>
    </tmf_document>
  </documents>
</tmf_study_item>

你会注意到一些最后一个元素在某些文件中出现两次,可能会有更多。 所以我需要知道的是如何展平这个xml层次结构并为子元素提供唯一的名称,最好是以。的形式     [parent.element] [迭代] [是childElement] 同时考虑到子元素的数量可能会有所不同,它应该可以输出到csv。 子项目的唯一标头是必要的,以便在excel中正确地进行映射。

这是我到目前为止编写的代码,我笨拙地尝试做的是首先处理第一级元素,然后处理&#34; tmf_document&#34;之后的元素,加入它们然后导出为CSV。但由于某种原因,我无法弄清楚我得到的错误: &#34; Add-Member:无法添加名称[基本上所有元素]的成员,因为已存在具有该名称的成员。&#34; 代码:

# Get all XML files
$rootElement = "tmf_study_item"
$documentElement = "tmf_document"
$midElement = "multi_index_data"
$items = Get-ChildItem *.xml
$scriptPath = $(get-location).Path
$scriptFolder = split-path $(get-location).Path -Leaf
$outputFile = $scriptPath+"\"+$scriptFolder+".csv"

# Loop through xmls and append them to the document
foreach ($item in $items) {
# Create filename for single CSV
$baseNameoutputFile = $scriptPath+"\"+(Get-ChildItem $item).BaseName+".csv"

[XML]$xml = (Get-Content $item) #load xml document
[Array]$RootitemConverted = $xml.GetElementsByTagName($rootElement)

# Create array for document elements
[Array]$DocConverted = $xml.GetElementsByTagName($documentElement)

# Create array for multi_index_data
[Array]$MidConverted = $xml.GetElementsByTagName($midElement)

$Collection = @()
Write-host "Start processing new XML >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>"

# Loop over 1st level elements in XML------------------------------------------------------------
ForEach($Record in $RootitemConverted){

$Output = new-object psobject

$Record.selectnodes("*")|%{
Add-Member -InputObject $Output -MemberType NoteProperty -Name $_.Name -Value $_.'#text'
}

Write-host "Add data to PSobject and collection >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>"

If($Collection){
$T2Keys = $Collection|gm|?{$_.MemberType -match "Property"}|Select -ExpandProperty Name
$T1Keys = $Output|gm|?{$_.MemberType -match "Property"}|Select -ExpandProperty Name
$KeysToAdd = $T2Keys|?{$T1Keys -notcontains $_}
$KeysToAdd|%{$Collection|Add-Member $_ ""}
}

$Collection += $Output
}


# Loop over documents level elements in XML-----------------------------------------------------------------
ForEach($Documents in $DocConverted){

$DocOutput = new-object psobject
#Add Prefix to document-elements
$Documents.selectnodes("*")|%{
$tmpName = $_.Name
$tmpName = $documentElement+"_1_"+$tmpName
Write-Host $tmpName
Add-Member -InputObject $DocOutput -MemberType NoteProperty -Name $tmpName -Value $_.'#text'
}

Write-host "Add data to PSobject and collection >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>"

If($Collection){
$T2Keys = $Collection|gm|?{$_.MemberType -match "Property"}|Select -ExpandProperty Name
$T1Keys = $DocOutput|gm|?{$_.MemberType -match "Property"}|Select -ExpandProperty Name
$KeysToAdd = $T2Keys|?{$T1Keys -notcontains $_}
$KeysToAdd|%{$Collection|Add-Member $_ ""}
}

$Collection += $DocOutput
}

$Collection

# Append to CSV File
$Collection | Export-Csv -path $outputFile -Delimiter ";" -NoTypeInformation -Append

# Create a CSV for each file
#$Collection | Export-Csv -path $baseNameoutputFile -Delimiter ";" -NoTypeInformation
}

我希望你们中的一些人可以就如何解决这个问题给我一些指示。

0 个答案:

没有答案