高级插入XML文件

时间:2017-08-24 17:52:15

标签: xml powershell

问题陈述

请查看this previous question of mine。我正在努力实现类似的东西,但这次采用了更高级的标准。

简而言之,我需要在其父<NETTOTAL>下添加子节点(XML标记)。子节点文本内容由从同一XML文件中提取的8位数字组成。这些数字正在被提取并存储在一个数组中供以后处理,您将在下面的脚本中看到。

关于脚本

现有脚本有效,但我怀疑循环逻辑是错误的。我需要它来挑选和放置一个XML标签,在每个父节点下面有相应的8位数字,而不是选择,循环,并放置相同的孩子。

原始XML文件内容

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<EXPORT>
    <IMPORTMODEL>NEX</IMPORTMODEL>
    <SESSION>1000061</SESSION>
    <CUSTORDERS>
        <RECORD CODE="NX0100103">
            <VATMODE>X</VATMODE>
            <INPUTDATE>26/07/2017</INPUTDATE>
            <NETTOTAL>97.40</NETTOTAL>
            <DOCLINES>
                <LINE>
                    <LINETYPE>M</LINETYPE>
                    <ITEMDESC>Salesperson: firstName1 lastName1 (43700006)</ITEMDESC>
                </LINE>
            </DOCLINES>
        </RECORD>
        <RECORD CODE="NX0100104">
            <VATMODE>X</VATMODE>
            <INPUTDATE>26/07/2017</INPUTDATE>
            <NETTOTAL>38.20</NETTOTAL>
            <DOCLINES>
                <LINE>
                    <LINETYPE>M</LINETYPE>
                    <ITEMDESC>Salesperson: firstName2 lastName2 (43100015)</ITEMDESC>
                </LINE>
            </DOCLINES>
        </RECORD>
        <RECORD CODE="NX0100105">
            <VATMODE>X</VATMODE>
            <INPUTDATE>26/07/2017</INPUTDATE>
            <NETTOTAL>63.00</NETTOTAL>
            <DOCLINES>
                <LINE>
                    <LINETYPE>M</LINETYPE>
                    <ITEMDESC>Salesperson: firstName3 lastName3 (43100014)</ITEMDESC>
                </LINE>
            </DOCLINES>
        </RECORD>
        <RECORD CODE="NX0100106">
            <VATMODE>X</VATMODE>
            <INPUTDATE>26/07/2017</INPUTDATE>
            <NETTOTAL>55.00</NETTOTAL>
            <DOCLINES>
                <LINE>
                    <LINETYPE>M</LINETYPE>
                    <ITEMDESC>Salesperson: firstName2 lastName2 (43100015)</ITEMDESC>
                </LINE>
            </DOCLINES>
        </RECORD>
    </CUSTORDERS>
</EXPORT>

期望目标

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<EXPORT>
    <IMPORTMODEL>NEX</IMPORTMODEL>
    <SESSION>1000061</SESSION>
    <CUSTORDERS>
        <RECORD CODE="NX0100103">
            <VATMODE>X</VATMODE>
            <INPUTDATE>26/07/2017</INPUTDATE>
            <NETTOTAL>97.40</NETTOTAL>
            <SALESMAN>43700006</SALESMAN>
            <DOCLINES>
                <LINE>
                    <LINETYPE>M</LINETYPE>
                    <ITEMDESC>Salesperson: firstName1 lastName1 (43700006)</ITEMDESC>
                </LINE>
            </DOCLINES>
        </RECORD>
        <RECORD CODE="NX0100104">
            <VATMODE>X</VATMODE>
            <INPUTDATE>26/07/2017</INPUTDATE>
            <NETTOTAL>38.20</NETTOTAL>
            <SALESMAN>43100015</SALESMAN>
            <DOCLINES>
                <LINE>
                    <LINETYPE>M</LINETYPE>
                    <ITEMDESC>Salesperson: firstName2 lastName2 (43100015)</ITEMDESC>
                </LINE>
            </DOCLINES>
        </RECORD>
        <RECORD CODE="NX0100105">
            <VATMODE>X</VATMODE>
            <INPUTDATE>26/07/2017</INPUTDATE>
            <NETTOTAL>63.00</NETTOTAL>
            <SALESMAN>43100014</SALESMAN>
            <DOCLINES>
                <LINE>
                    <LINETYPE>M</LINETYPE>
                    <ITEMDESC>Salesperson: firstName3 lastName3 (43100014)</ITEMDESC>
                </LINE>
            </DOCLINES>
        </RECORD>
        <RECORD CODE="NX0100106">
            <VATMODE>X</VATMODE>
            <INPUTDATE>26/07/2017</INPUTDATE>
            <NETTOTAL>55.00</NETTOTAL>
            <SALESMAN>43100015</SALESMAN>
            <DOCLINES>
                <LINE>
                    <LINETYPE>M</LINETYPE>
                    <ITEMDESC>Salesperson: firstName2 lastName2 (43100015)</ITEMDESC>
                </LINE>
            </DOCLINES>
        </RECORD>
    </CUSTORDERS>
</EXPORT>

脚本

$xmlFilesLocation = "C:\XML_dumping"

cd $xmlFilesLocation

$netTotalRegEx = "(<NETTOTAL>\d{1,30}\.\d{1,2}<\/NETTOTAL>)"
$salesManRegEx = "(<SALESMAN>\d{8}<\/SALESMAN>)"

$beginTag = "`t`t`t<SALESMAN>"
$endTag = "</SALESMAN>"

$files = Get-ChildItem -Path $xmlFilesLocation -Filter *.xml

$numberOfFiles = (Get-ChildItem -Path $xmlFilesLocation -Filter *.xml | Measure-Object).Count

# First, loop through all files separately to check if <SALESMAN>[code]</SALESMAN> exists, and skip if true
for ($i=1; $i -le $numberOfFiles; $i++) {
    $content = (Get-Content $files[$i - 1] -Raw)

    # Skip file if <SALESMAN>[code]</SALESMAN> is detected in it
    if ($content -match $salesManRegEx) { break }
}

# Then, loop through all files (again) separately to check if <SALESMAN>[code]</SALESMAN> is missing, and process if true
for ($j=1; $j -le $numberOfFiles; $j++) {
    $content = (Get-Content $files[$j - 1] -Raw)

    # If <SALESMAN>[code]</SALESMAN> is missing in the file
    if ($content -notmatch $salesManRegEx) {
        $contentArray = @()

        # Hold all the content, but split from the brackets
        $contentArray = $content
        $contentArray = $contentArray.Split("()")
        # Now split by line to extract the salesman codes into an array.
        # Example: [43700006, 43100015, 43100014, 43100015]
        $contentArray = $contentArray.Split("")

        for ($k=1; $k -le $contentArray.Length; $k++) {
            # if the salesman code is found...
            if ($contentArray[$k] -match "^\d{8}$") {
                if ($content -notmatch $salesManRegEx) {
                    # Construct the full tag
                    $fullSalesManTag = $beginTag + $contentArray[$k] + $endTag

                    # ...then replace in $content the regular expression with $fullSalesManTag and insert it directly underneath NETTOTAL line
                    $content= [regex]::Replace($content, $netTotalRegEx, ('$1' + "`n" + "$fullSalesManTag"))

                    $content | Out-File -Encoding UTF8 $files[$j - 1]
                }
            }
        }
    }
}

当前输出

输出显示它只添加了数组索引中的最后一个元素。这就是循环结束的时候。我理解为什么会发生这种情况,但我无法解决纠正逻辑的解决方案。

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<EXPORT>
    <IMPORTMODEL>NEX</IMPORTMODEL>
    <SESSION>1000061</SESSION>
    <CUSTORDERS>
        <RECORD CODE="NX0100103">
            <VATMODE>X</VATMODE>
            <INPUTDATE>26/07/2017</INPUTDATE>
            <NETTOTAL>97.40</NETTOTAL>
            <SALESMAN>43700006</SALESMAN>
            <DOCLINES>
                <LINE>
                    <LINETYPE>M</LINETYPE>
                    <ITEMDESC>Salesperson: firstName1 lastName1 (43700006)</ITEMDESC>
                </LINE>
            </DOCLINES>
        </RECORD>
        <RECORD CODE="NX0100104">
            <VATMODE>X</VATMODE>
            <INPUTDATE>26/07/2017</INPUTDATE>
            <NETTOTAL>38.20</NETTOTAL>
            <SALESMAN>43700006</SALESMAN>
            <DOCLINES>
                <LINE>
                    <LINETYPE>M</LINETYPE>
                    <ITEMDESC>Salesperson: firstName2 lastName2 (43100015)</ITEMDESC>
                </LINE>
            </DOCLINES>
        </RECORD>
        <RECORD CODE="NX0100105">
            <VATMODE>X</VATMODE>
            <INPUTDATE>26/07/2017</INPUTDATE>
            <NETTOTAL>63.00</NETTOTAL>
            <SALESMAN>43700006</SALESMAN>
            <DOCLINES>
                <LINE>
                    <LINETYPE>M</LINETYPE>
                    <ITEMDESC>Salesperson: firstName3 lastName3 (43100014)</ITEMDESC>
                </LINE>
            </DOCLINES>
        </RECORD>
        <RECORD CODE="NX0100106">
            <VATMODE>X</VATMODE>
            <INPUTDATE>26/07/2017</INPUTDATE>
            <NETTOTAL>55.00</NETTOTAL>
            <SALESMAN>43700006</SALESMAN>
            <DOCLINES>
                <LINE>
                    <LINETYPE>M</LINETYPE>
                    <ITEMDESC>Salesperson: firstName2 lastName2 (43100015)</ITEMDESC>
                </LINE>
            </DOCLINES>
        </RECORD>
    </CUSTORDERS>
</EXPORT>

1 个答案:

答案 0 :(得分:4)

Do not parse XML with regex。每当你做彩虹独角兽死亡。

但严重的是,在大多数情况下,正则表达式是使用XML文件的错误工具。如果您感兴趣,this question的答案(感谢kjhughes的链接)深入探讨了正则表达式方法的问题。

使用正确的XML解析器和一对XPath expressions来提取销售员ID并将其添加为新节点:

$xmlfile = 'C:\path\to\your.xml'

[xml]$xml = Get-Content $xmlfile

$xml.SelectNodes('//RECORD') | ForEach-Object {
  $id = $_.SelectSingleNode('.//ITEMDESC').'#text' -replace '.*\((\d+)\).*', '$1'

  $sibling = $_.SelectSingleNode('./NETTOTAL')

  $node = $xml.CreateElement('SALESMAN')
  $node.InnerText = $id
  $_.InsertAfter($node, $sibling)
}

$xml.Save($xmlfile)