Select-String从模式A到模式B.

时间:2015-08-04 12:23:30

标签: xml powershell-v2.0 select-string

有没有办法使用Select-String查找XY之间的所有行。

e.g。如果我有一个包含内容的文件:

[line 157: Time 2015-08-04 11:34:00] 
<staff>
    <employee>
        <Name>Bob Smith</Name>
        <function>management</function>
        <age>39</age>
        <birthday>3rd June</birthday>
        <car>yes</car>
    </employee>
    <employee>
        <Name>Sam Jones</Name>
        <function>security</function>
        <age>24</age>
    </employee>
    <employee>
        <Name>Mark Perkins</Name>
        <function>management</function>
        <age>32</age>
    </employee>
</staff>

我希望找到< function >management< /function >的所有内容,所以我最终会找到:

<employee>
    <Name>Bob Smith</Name>
    <function>management</function>
    <age>39</age>
    <birthday>3rd June</birthday>
    <car>yes</car>
</employee>
<employee>
    <Name>Mark Perkins</Name>
    <function>management</function>
    <age>32</age>
</employee>    

如果所有分组大小相同,我可以使用以下内容:

Select-String -Pattern '<function>management</function>' -CaseSensitive -Context 2,2

然而,实际上它们的大小不一样,所以我每次都不能使用固定的数字。

我真的需要一种方式来回复一切:

2 rows above my search term
until
the following '</employee>' field

用于所有匹配的实例。

这可能吗?

我无法在PowerShell中使用标准xml工具,因为我正在阅读的文件不是标准xml,因此我以[line 157: Time 2015-08-04 11:34:00]为例。想到它的最好方法是使用大量的xml文件,这些文件都合并到一个xml文件中,并使用[line . . .]标题将其分解。

其他信息: 我担心我的例子有点过于简单,实际文件更像是:

[line 157: Time 2015-08-04 11:34:00]
<?xml version="1.0" encoding="utf-8"?>
<other>
    <stuff>
    . . .
    </stuff>
</other>

<?xml version="1.0" encoding="utf-8"?>
<staff>
    <employee>
    ...
    </employee>
</staff> 

<staff>
    <employee>
    ...
    </employee>
</staff>
[line End: Time 2015-08-04 11:34:00]

其他信息 我添加了代码来忽略< ?xml version. . .行。 我还尝试添加自己的根元素:

$first = "<open>"
$last = "</open>"
$a = 0

. . .

if($a -eq 0)
    {
        $XmlFiles[$Index] += $first
        $a++
    } 

. . .

$XmlFiles[$Index] += $last

但这会产生Array assignment failed because index '-1' was out of range.错误

其他信息 最终结果如下:

$FilePath = "C:\Path\To\XmlDocs.txt"
$XmlFiles = @()
$Index = -1

$first = "<open>"
$last = "</open>"

# Go through the file and store the individual xml documents in a string array
$a=0
Get-Content $FilePath | `
%{
    if($_ -match "^\[line\ \d+")
        {
            if($a -eq 0)
                {
                    #if this is the top line, ignore it
                }
            else
                {
                    #if this is a boundary, add a closing < /open > tag
                    $XmlFiles[$Index] += $last
                }
            # We've got a boundary, move to next index in array
            $Index++
            # Add a new string to hold the next xml document
            $XmlFiles += ""
            # Add an < open > tag
            $XmlFiles[$Index] += $first
            $a++
        } 
    elseif ($_ -match '^\<\?xml') #ignore xml headers
        {
            # End of Section, or XML Header. Do Nothing and move on
        }
    elseif([string]::IsNullOrEmpty($_))
        {
            # Blank Line, Do Nothing and move on
        }
    else 
        {
            # Add each line to the string (xml doesn't care about line breaks)
            $XmlFiles[$Index] += $_
        }
}

# add the final < /open > tag
$XmlFiles[$Index] += $last

$a=0
$Results = foreach($File in $XmlFiles)
{
    $Xml = [xml]($File.Trim())
    # Parse string as an Xml document
    $Xml = [xml]$File
    # Use Xpath to find the manager
    $Xml.SelectNodes("//employee[function = 'management']") |% {$_}
    $a++
}

$Results

它基本上忽略标题[line. . .,xml定义< ?xml和任何空行,并在每个部分周围添加< open >. . . < /open >标记以使其有效。

1 个答案:

答案 0 :(得分:1)

我认为你高估了将单个Xml文档解析为实际XML的挑战。您可以逐行阅读文件,然后使用&#34; [line ...]&#34;字符串作为单个文档之间的边界:

$FilePath = "C:\Path\To\XmlDocs.txt"
$XmlFiles = @()
$Index = -1

# Go through the file and store the individual xml documents in a string array
Get-Content $FilePath |%{
    if($_ -match "^\[line\ \d+"){
        # We've got a boundary, move to next index in array
        $Index++
        # Add a new string to hold the next xml document
        $XmlFiles += ""
    } else {
        # Add each line to the string (xml doesn't care about line breaks)
        $XmlFiles[$Index] += $_
    }
}

$Managers = foreach($File in $XmlFiles){
    # Parse string as an Xml document
    $Xml = [xml]$File
    # Use Xpath to find the manager
    $Xml.SelectNodes("//employee[function = 'management']") |% {$_}
}

使用这样的示例文件(示例的修改/扩展版本):

[line 157: Time 2015-08-04 11:34:00] 
<staff>
    <employee>
        <Name>Bob Smith</Name>
        <function>management</function>
        <age>39</age>
        <birthday>3rd June</birthday>
        <car>yes</car>
    </employee>
    <employee>
        <Name>Sam Jones</Name>
        <function>security</function>
        <age>24</age>
    </employee>
    <employee>
        <Name>Mark Perkins</Name>
        <function>management</function>
        <age>32</age>
    </employee>
</staff>
[line 158: Time 2015-08-06 12:36:30] 
<staff>
    <employee>
        <Name>Rob Smith</Name>
        <function>management</function>
        <age>39</age>
        <birthday>3rd June</birthday>
        <car>yes</car>
    </employee>
    <employee>
        <Name>Cam Jones</Name>
        <function>security</function>
        <age>24</age>
    </employee>
    <employee>
        <Name>Stark Perkins</Name>
        <function>management</function>
        <age>32</age>
    </employee>
</staff>

结果$Managers将是:

PS C:\> $Managers|Select Name,function,age

Name                               function                          age
----                               --------                          ---
Bob Smith                          management                        39
Mark Perkins                       management                        32
Rob Smith                          management                        39
Stark Perkins                      management                        32