Question

如果具有以下条件，如何使用powershell从子字符串中获取文件的总和并将总和放在特定位置（不同的行）：

获取以字符D开头的行的第3位到第13位的数字之和。将总和放在以S开头的行上的第10位到第14位

例如，如果我有这个文件：

F123trial   text
DA00000038.95==xxx11
DA00000018.95==yyy11
DA00000018.95==zzzyy
S        xxxxx

我希望获得38.95,18.95和18.95的总和，然后将总和放在位于以S开头的行下的位置xxxxx。

Answer 1

你可以尝试：

-match使用regex-pattern
.NET字符串方法Substring()，用于从“D”-lines
Measure-Object -Sum计算总和
-replace插入值（使用正则表达式搜索）。

例如：

$text = Get-Content -Path file.txt

$total = $text -match '^D' |
#Foreach "D"-line, extract the value and cast to double (to be able to sum it)
ForEach-Object { $_.Substring(2,11) -as [double] } |
#Measure the sum
Measure-Object -Sum | Select-Object -ExpandProperty Sum

$text | ForEach-Object {
    if($_ -match '^S') {
        #Line starts with S -> Insert sum
        $_.SubString(0,(17-$total.Length)) + $total + $_.SubString(17)
    } else {
        #Not "S"-line -> output original content
        $_
    }
} | Set-Content -Path file.txt

Answer 2

PowerShell的 switch statement 具有强大但鲜为人知的功能，可让您遍历文件行（-file）并按常规匹配行表达式（-regex）。

switch -file 方便， 比在管道中使用cmdlet更快 （参见下一节）

[double] $sum = 0 switch -regex -file ('file.txt') { # Note: The string to the left of each script block below ({ ... }), # e.g., '^D', is the regex to match each line against. # Inside the script blocks, $_ refers to the input line at hand. # Extract number, add to sum, output the line. '^D' { $sum += $_.Substring(2, 11); $_; continue } # Summary line: place sum at character position 10, with 0-padding # Note: `-replace ',', '.'` is only needed if your culture uses "," as the # decimal mark. '^S' { $_.Substring(0, 9) + '{0:000000000000000.00}' -f $sum -replace ',', '.'; continue } # All other lines: pass them through. default { $_ } }

^{注意：

脚本中的* continue阻止了对于手头线路的进一步匹配的短路;相反，如果您使用break，则不会处理其他行
*根据以后的评论，我假设你想要在0行的S行上{18}字符10左边填充数字。}

使用您的示例文件，上面的结果为：

F123trial text DA00000038.95==xxx11 DA00000018.95==yyy11 DA00000018.95==zzzyy S 000000000000076.85

可选阅读：比较switch -file ...与Get-Content ... | ForEach-Object ...
的效果
运行以下测试脚本：

& { # Create a sample file with 100K lines. 1..1e5 > ($tmpFile = [IO.Path]::GetTempFileName()) (Measure-Command { switch -file ($tmpFile) { default { $_ } } }).TotalSeconds, (Measure-Command { get-content $tmpFile | % { $_ } }).TotalSeconds Remove-Item $tmpFile }
例如，
在我的机器上产生以下时间（绝对数字不重要，但它们的比率应该给你一个感觉）：

0.0578924 # switch -file 6.0417638 # Get-Content | ForEach-Object

也就是说，基于管道的解决方案比switch -file解决方案慢约100（！）倍。

深入挖掘：

Frode F.指出Get-Content对于大文件来说速度很慢 - 虽然它的便利性使其成为一种流行的选择 - 并提到直接使用.NET Framework作为替代方案：

使用[System.IO.File]::ReadAllLines();但是，鉴于它将整个文件读入内存，这只是一个带有小文件的选项。

在循环中使用[System.IO.StreamReader]的{{1}}方法。

但是，无论使用何种特定cmdlet，使用管道本身都会产生开销。当性能很重要 - 但只有这样 - 你应该避免它。

这是一个更新的测试，包括使用.NET Framework方法的命令，有和没有管道（使用集合运算符ReadLine()需要PSv4 +）：

.ForEach()

样本结果，从最快到最慢：

& { # Create a sample file with 100K lines. 1..1e5 > ($tmpFile = [IO.Path]::GetTempFileName()) (Measure-Command { switch -file ($tmpFile) { default { $_ } } }).TotalSeconds (Measure-Command { $sr = [IO.StreamReader] (Convert-Path $tmpFile) while(-not $sr.EndOfStream) { $sr.ReadLine() } $sr.Close() }).TotalSeconds (Measure-Command { [IO.File]::ReadAllLines((Convert-Path $tmpFile)).ForEach({ $_ }) }).TotalSeconds (Measure-Command { [IO.File]::ReadAllLines((Convert-Path $tmpFile)) | % { $_ } }).TotalSeconds (Measure-Command { Get-Content $tmpFile | % { $_ } }).TotalSeconds Remove-Item $tmpFile }

0.0571143 # switch -file 0.2035162 # [System.IO.StreamReader] in a loop 0.6756535 # [System.IO.File]::ReadAllText() with .ForEach() collection operator 1.5088355 # (pipeline) [System.IO.File]::ReadAllText() with ForEach-Object 5.9815751 # (pipeline) Get-Content with ForEach-Object是最快的3倍，其次是.NET +循环解决方案;使用switch -file会增加另一个因子3。简单地引入管道（.ForEach()而不是ForEach-Object）会增加另一个因子2;最后，使用.ForEach()和Get-Content的管道添加另一个因子4。

powershell获取特定子串位置的总和

2 个答案:

可选阅读：比较`switch -file ...`与`Get-Content ... | ForEach-Object ...`

powershell获取特定子串位置的总和

2 个答案:

可选阅读：比较switch -file ...与Get-Content ... | ForEach-Object ...

可选阅读：比较`switch -file ...`与`Get-Content ... | ForEach-Object ...`