Question

我们正在处理一个包含许多不同类型报告的文本文件。这些报告中的一些需要更改某些单词，或者仅照原样复制它们。

该文件必须保留为单个文本文件，因此其思想是在文件中移动，比较行。如果找到的行是“ ReportType1”，那么我们需要更改某些措辞，因此我们进入一个内部循环，提取数据并随便更改单词。循环在到达报表中的页脚时结束，应继续到下一个报表。

我们已经尝试过-match，-like，-contains，-eq，但是它从未像预期的那样工作。我们要么获得了不应该更改/重新格式化的数据，要么仅获得了标头数据。

Add-Type -AssemblyName System.Collections
Add-Type -AssemblyName System.Text.RegularExpressions

[System.Collections.Generic.List[string]]$content = @()

$inputFile   = "drive\folder\inputfile.txt"
$outputFile  = "drive\folder\outputfile.txt"

#This will retrieve the total number of lines in the file
$FileContent = Get-Content $inputFile
$FileLineCount = $FileContent | Measure-Object -Line
$TotalLines = $FileContent.Count

$TotalLines++ #Need to increase by one; the last line is blank

$startLine   = 0
$lineCounter = 0

#Start reading the file; this is the Header section
#Number of lines may vary, but data is copied over word
#for word
foreach($line in Get-Content $inputfile)
{
    $startLine++
    If($line -match "FOOTER")
    {
        [void]$content.Add( $line )
        break
    }
    else
    {
        [void]$content.Add( $line )
    }
}
## ^^This section works perfectly

#Start reading the body of the file
Do {
    #Start reading from the current position
    #This should change with each report read
    $line = Get-Content $inputFile | select -Skip $startLine

    If($line -match "ReportType1") #If it's a ReportType1, some wording needs to be changed
    {
        #Start reading the file from the current position
        #Should loop through this record only
        foreach($line in Get-Content $inputFile | select -skip $startline) 
        {
            If($line -match "FOOTER") #End of the current record
            {
                [void]$content.Add( $line )
                break #break out of the loop and continue reading the file from the new current position
            }
            elseif ($line -match "OldWord") #Have to replace a word on some lines
            {
                $line = $line.Replace("OldWord","NewWord")
                [void]$content.Add( $line ) 
            }
            else
            { 
                [void]$content.Add( $line ) 
            }
            $startline++                
        }
    }
    else
    {
         If($line -match "ReportType2") #ReportType2 can just be copied over line for line
         {
             #Start reading the file from the current position
             #Should loop through this record only
             foreach($line in Get-Content $inputFile | select -skip $startline) 
             {
                If($line -match "FOOTER") #End of the current record
                {
                    [void]$content.Add( $line )
                    break #break out of the loop and continue reading the file from the new current position
                }
                else
                { 
                    [void]$content.Add( $line ) 
                }
                $startline++                
        }
    }
    $startline++
} until ($startline -eq $TotalLines)

[System.IO.File]::WriteAllLines( $outputFile, $content ) | Out-Null

这是可行的，但是我们遇到了一些意外的行为。报表看起来很好，但是，即使未设置代码来更改“ ReportType2”中的单词，也是如此。就像它只经过第一个IF语句一样。但是如果线不匹配怎么办？

我们知道$ startline变量在迭代过程中不断增加，所以这并不意味着它停留在一行上。但是，执行“ Write-Host”显示$ line始终是“ ReportType1”，这是不正确的，因为这些行像应该的那样显示在报告中。

样本数据：

<header data>
.
43 lines (although this can vary)
.
<footer>
<ReportType1> 
. 
x number of lines (varies)
. 
<footer> 
<ReportType2> 
. 
x number of lines (varies)
. 
<footer>

依次类推，直到文件末尾。不同类型的报告混合在一起。

我们所能想到的是，我们缺少了一些东西，也许很明显，它将使它正确地输出数据。

感谢您的帮助。

Answer 1

以下应做您想做的。只需将$oldword和$newword的值替换为您的单词替换（目前不区分大小写），并将$report的值替换为您要更新的报告标题即可。

$oldword = "Liability"
$newword = "Asset"
$report = "ReportType1"
$data = Get-Content Input.txt
$reports = $data | Select-String -Pattern $Report -AllMatches
$footers = $data | Select-String -Pattern "FOOTER" -AllMatches
$startindex = 0
[collections.arraylist]$output = foreach ($line in $reports) {
    $section = ($line.linenumber-1),($footers.linenumber.where({$_ -gt $line.linenumber},'First')[0]-1)
    if ($startindex -lt $section[0]-1) {
        $data[$startindex..($section[0]-1)]
    }
    if ($startindex -eq $section[0]-1) {
        $data[$startindex]
    }
    $data[$section[0]..$section[1]] -replace $oldword,$newword
    $startindex = $section[1]+1
}
if ($startindex -eq $data.count-1) {
    [void]$output.Add($data[$startindex])
}
if ($startindex -lt $data.count-1) {
    [void]$output.Add($data[$startindex..($data.count-1)])
}
$output | Set-Content Output.txt

代码说明：

$oldword的用途是在正则表达式替换操作中使用。因此，任何特殊的正则表达式字符都需要转义。我选择在这里为您执行此操作。如果要更新要替换的字符串，则只需更新引号之间的字符。当我们将其传递给-replace运算符时，这是不区分大小写的。

$newword只是将替换$oldword的输出的字符串。除非字符串包含特殊的PowerShell字符，否则不需要任何特殊处理。替换文字将按大小写显示。

$report是要替换数据的节的标题的名称。当我们将其传递给Select-String -Pattern时，不区分大小写。

$data只是文件的内容作为数组。文件的每一行都是数组中的索引项目。

第一个Select-String进行正则表达式匹配，且正则表达式模式为-Pattern $Report。它使用正则表达式的原因是因为我们没有指定-SimpleMatch参数。添加-AllMatches来捕获文件中$Report的每个实例。输出存储在$Reports中。 $ Reports是MatchInfo对象的数组，这些对象具有我们将使用的属性，例如Line和LineNumber。

第二个Select-String进行正则表达式匹配，且正则表达式模式为-Pattern "FOOTER"。如果可以更改，则可以将其设置为变量。它使用正则表达式的原因是因为我们没有指定-SimpleMatch参数。添加-AllMatches来捕获文件中FOOTER的每个实例。

$startIndex用于跟踪数组中的位置。它在帮助我们抓取所选文本的不同部分方面发挥了作用。

$output是一个数组列表，其中包含我们从$data中读取的行以及与您的报告标题相匹配的所选文本（Select-String -Pattern $Report输出）。它是一个数组列表，因此我们可以访问Add()方法来更有效地构造集合。与使用+=和自定义对象数组相比，效率要高得多。

代码的心脏始于foreach循环，该循环遍历$Reports中的每个对象。每个当前对象都存储在$line中。结果，$Line将成为MatchInfo对象。 $section是由下一个$report匹配到下一个可用FOOTER匹配的行号数组（由于索引从0开始，偏移量为-1，因为索引从0开始）。循环中的if语句仅处理某些条件，例如$report与文件的第一行或第二行或下一部分的第一行或第二行匹配。 foreach循环最终将输出所有导致第一个$report匹配的文本，每个$report匹配中的文本，包括FOOTER匹配，以及所有匹配之间的文本。

if循环之后的foreach语句将文件的其余部分添加到$output的最后一个匹配项之外。

首次尝试出现的问题：

在尝试中，给您造成问题的是文件中报告的顺序。如果在文件的ReportType2之后显示ReportType1，则第一个If语句将始终为true。您没有检查线段。相反，您要检查从某一行开始的所有其余行。我将通过一个示例来说明我在说什么：

下面是带有行号的示例文件

1. <footer>
2. <ReportType2>
3. data
4. data
5. <footer>
6. <ReportType1>
7. data
8. <footer>

到达第一个页脚后，您的起始行将为1。然后，您将读取跳过1的所有行，包括第2行和第6行。($line | select-object -skip 1) -match "ReportType1"将找到一个匹配项，并在$true语句中返回if。在下一个for循环中，您将迭代直到startline变为5。然后($line | select-object -skip 5) -match "ReportType1"也将找到一个匹配项。您的逻辑唯一可行的方法是，如果ReportType1部分位于文件中的ReportType2之前。

比较字符串未返回正确信息

1 个答案: