Question

摘要：我可以在第三个文本计数处开始复制并在该实例的第四个计数处停止吗？

我有一些Windows事件日志文件需要导入到我们的事件管理器程序中，但它们目前不兼容。我的日志有一个被星号包围的标题，我试图计算星号行并在第三行后开始复制。然后我想将文本导入另一个文件。

我还想在遇到4 ^th行星号时停止复制，因为这标志着我需要的信息的结束。

抱歉这个措辞太奇怪了。我以前尝试过的是下面的内容。注释掉的线条是我尝试过的，但对我没用。

标题示例：

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Log
* Date/Time Generated: 10/30/2013   12:01 AM
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

代码：

#$log = Get-Content -Path .\filepath
#$asterisk = "* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *"
#$count = Measure-Object -Line $asterisk

#ForEach ($line in $log){
#DO
#{
#  DO{
#  $log | Add-Content .\filepath\test.txt
#  }until($count -eq 4)


#}until($count -eq 4)}

#$LogFile = Get-Content -Path .\filepath
$Asterisks = Get-Content -Path .\filepath | Select-String -Pattern "\* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \* \*"

#DO
#{
#  DO{
#  $Asterisks.Matches.Count
#  Get-Content -Path .\filepath
#  }While($Asterisks.Matches.Count -eq 3)

$Asterisks|Add-Content .\filepath
#}while($Asterisks.Matches.Count -eq 3)

Answer 1

我根据您的评论不同地理解您的需求。使用与KevinD相似的样本样式我假设类似于此。

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Log
* Date/Time Generated: 10/30/2013 12:01 AM
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Data you want
More data you want
...
oodles of it even
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Data you don't want
More data you REALLY dont want
...
so much crap
...
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

您希望文本位于双星号行集之间。让我们玩一些简单的正则表达式。这至少需要PowerShell 3.0（如果需要可以调整）

# PowerShell 3.0+
$log = Get-Content -Path .\filepath -Raw
# PowerShell 2.0
$log = (Get-Content -Path .\filepath) -join "`r`n"
$asteriskLine = '\*( \*){38}'
If($log -match ("(?sm){0}`r`n{0}(.*?){0}`r`n{0}" -f $asteriskLine)){
   $Matches[1]
}

哪匹配

Data you want
More data you want
...
oodles of it even

$asteriskLine是我试图整理你所拥有的长字符串以及修复手动转义所有这些字符的需要。我只是构建一个字符串并将其转换为数组，以便我可以再次使用空格来连接它。

正则表达式的目的是收集星号第一个 set 之后出现的文本，直到下一组。为了使正则表达式更易于阅读，我们使用格式运算符，这样我们就不必拥有一个充满转义\*的长字符串。

使用-match是一种测试匹配并通过$Matches[1]获取结果的简单方法，因为数据包含在捕获组中(.*?)

关于效率低下的说明

以这种方式在这么大的文件上使用Get-Content被认为是非常低效的。但是，我提供的代码应该直截了当地理解。您还可以查看StreamReader并在看到星号组时设置标志。一切都取决于您的需求和耐心。

从评论中更新

有可能也许 Ansgar有一个好主意:)以及如何使$asteriskLine更简单。甚至不确定为什么它没有发生在我身上。

<强>的StreamReader

我没有使用这么多，因为我没有玩大文件。假设您的日志确实看起来像我上面的示例应该正常工作。

$filePath = "c:\temp\text.txt"
$outputFile = "C:\temp\outputfile.txt"
$asteriskLine = '\*( \*){38}'
$file = New-Object System.IO.StreamReader -Arg $filePath
[boolean]$flagReadData = $False
$asteriskRepeatCount = 0

while ($line = $file.ReadLine()) {
    # Check if this line is an astericks 
    If($line -match $asteriskLine){
        # Raise the astericks count
        $asteriskRepeatCount++

        # Check to see if we have found
        If ($asteriskRepeatCount -eq 2){
            # We have just found 2 repeating lines of $asteriskLine. Check the readData flag
            If($flagReadData){
                # We have hit the end of the stream and we can stop.
                $flagReadData =  $False
                break
            } Else {
                # Start recording the lines.
                $flagReadData =  $true
            }

            # Reset the count. 
            $asteriskRepeatCount = 0
        }

    } Else {
        # Current line does not match. Reset the count.
        $asteriskRepeatCount = 0
    }

    # Pass line if criteria are met. 
    If($asteriskRepeatCount -eq 0 -and $flagReadData -and $line -notmatch $asteriskLine){
        $line | Add-Content $outputFile
    }
}
$file.close()

基本上它一次读取一个文件的每一行。脚本维护它按顺序遇到的星号行的计数。当它第一次找到两个布尔标志时。当该标志为真时，它会输出它读取的所有行。当它找到下一组双星号行时，它会停止读取该文件。

Answer 2

如果我理解正确，您的日志文件如下所示：

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Log
* Date/Time Generated: 10/30/2013 12:01 AM
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Data you want
More data you want
...
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Data you don't want
More data you don't want
...
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* Log
* Date/Time Generated: 10/30/2013 12:02 AM
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
Second set of data you want
...
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
...

假设这是正确的，并且您希望将所有数据保存到同一个文件中，这应该这样做：

$log = Get-Content -Path .\filepath
$asterisk = "* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *"
$count = 0

foreach ($line in $log) {
    If ($count -eq 3 -and $line -ne $asterisk) {
        $line | Add-Content .\filepath\test.txt
    }

    If ($line -eq $asterisk) {
        $count++
    }

    If ($count -eq 4) {
        $count = 0
    }

}

如果我误解了，你只想要第一组数据，请用“break”替换“$ count = 0”。

Answer 3

与Matt非常相似，因为它基于RegEx，但我只是根据星号行拆分文件，删除空白结果，跳过第一个结果（日志和日期/时间线））并且只选择下一个项目（给出Matt使用的样本应该正是您所需要的）。我们将再次使用v3 +中的-Raw参数，或者如果您使用v2，则可以-Join使用新行创建多行字符串。

(Get-Content C:\Path\To\File.log -raw) -split "(?m)^\*(?: \*){38}"|?{!([string]::IsNullOrWhiteSpace($_))}|Select -skip 1 -first 1

或者在v2 ......

(Get-Content C:\Path\To\File.log) -join "`r`n" -split "(?m)^\*(?: \*){38}"|?{!([string]::IsNullOrWhiteSpace($_))}|Select -skip 1 -first 1

无论哪种方式，根据马特的样本，你得到：

您想要的数据
   您想要的更多数据
  ...
  它甚至还有

现在您可以将其传递给Set-Content并将其输出到文件，或将其分配给变量或其他任何内容。

如果你的日志非常大，而且你确定你的数据肯定在第一行，比如1000行左右，你可能要考虑使用-TotalCount参数Get-Content小命令。这会将行的开头改为：

(Get-Content C:\Path\To\File.log -raw -totalcount 1000)

这只会读取前1000行，如果您感兴趣的是文件的开头，可能会大大加快速度。但同样，如果您确定您的数据位于文件的前X行内，那么这只是一个选项。

Answer 4

虽然您没有说出来，但我在答案中考虑您要复制到其他文件的所有数据都不以星号开头。如果确实如此，没问题，您可以稍微调整$rx以匹配不同的要求。

考虑到这些条件，您需要的代码非常简单。 Firsy定义了初始变量：

$file,$rx,$flag=
    'c:\...\CrapLog.log',
    '\s*\*+',
    $false

这很简单。

现在，使用 ONE-LINER ：

switch -r -f($file){$rx{if($flag){break}else{continue}}default{$flag=$true;$_}}

好吧，如果你想看到更多行中的单线，请在此处查看：

switch -r -f($file){
    $rx{if($flag){break}else{continue}}
    default{$flag=$true;$_}
}

就是这样。

顺便说一句，上面的代码会输出你想要的所有行，你可以将它们包含在另一个文件中（如你所愿）。

为什么要编写很多很多行，如果你能用更少的输入来编写它？毕竟，这是脚本。

switch声明功能非常强大......

ADDED简化为switch语句：

switch -r -f($file){ $rx{if($flag){break}} default{$flag=$true;$_} }

switch声明非常强大......

将输出写入anotherfile的ADDED示例：

$file,$rx,$flag=
    'c:\...\example.txt',
    '\s*\*+',
    $false

$fileOUT='c:\...\excerpt.txt'

$lines=switch -r -f($file){
    $rx{if($flag){break}}
    default{$flag=$true;$_}
}

$lines >$fileOUT

其中example.txt可能是这样的：

********************************
* Company Contoso
********************************
********************************
I want this line 1
I want this line 2
I want this line 3
I want this line 1000 (or even more)
********************************
I DON'T want this line and any other below this one: 1
I DON'T want this line and any other below this one: 2
********************************
I DON'T want this line and any other below this one: 3
********************************
********************************
I DON'T want this line and any other below this one: 4
I DON'T want this line and any other below this one: 5
********************************

如何在行数后复制文本？

4 个答案: