使用PowerShell get-content查找替换

时间:2014-06-23 11:43:13

标签: powershell




(get-content C:\TrainingFile\TrainingFile.txt) | foreach-object {$_ -replace "123-45-6789", "666-66-6666"} | set-content C:\TrainingFile\TrainingFile.txt

我的问题是,我在.ps1文件中拥有17,000行此代码。 ps1文件看起来类似于

我只对一个命令进行了测试,并且执行了大约15个secoonds。做数学运算,170000 X 15秒出现大约3天来运行17,000个命令的.ps1脚本。


4 个答案:

答案 0 :(得分:2)


select SSN (X) and masked SSN (X') from a list
read all rows from file
look each file row for string X
if found, replace with X'
save all rows to file
loop until all SSNs are processed



read all rows from file
loop through rows
for current row, change X -> X'
save the result

为什么这会更快? 1)您阅读并保存文件一次。磁盘IO很慢。 2)您只处理每一行,因此没有进行额外的工作。至于如何实际执行X - > X'变换,你必须更仔细地定义掩蔽规则是什么。



因为你已经知道f(X) - > X'结果,您应该将预先计算的列表保存到磁盘,如此,

ssn, mask
"123-45-6789", "666-66-6666"
"223-45-6789", "666-66-6667"

将文件导入哈希表并通过窃取来自Ansgar's answer的所有多汁位来继续前进,

$ssnMask = @{}
$ssn = import-csv "c:\temp\SSNMasks.csv" -delimiter ","

# Add X -> X' to hashtable
$ssn | % {
  if(-not $ssnMask.ContainsKey($_.ssn)) {
    # It's an error to add existing key, so check first 
    $ssnMask.Add($_.ssn, $_.mask)

$dataToMask = get-content "c:\temp\training.txt"
$dataToMask | % {
   if ( $_ -match '(\d{3}-\d{2}-\d{4})' ) {
     # Replace SSN look-a-like with value from hashtable
     # NB: This simply removes SSNs that don't have a match in hashtable
     $_ -replace  $matches[1], $ssnMask[$matches[1]]
} | set-content "c:\temp\training2.txt"

答案 1 :(得分:0)

避免多次读写文件。 I / O很昂贵,这会降低你的脚本速度。尝试这样的事情:

$filename = 'C:\TrainingFile\TrainingFile.txt'

$ssnMap = @{}
(Get-Content $filename) | % {
  if ( $_ -match '(\d{3}-\d{2}-\d{4})' ) {
    # If SSN is found, check if a mapping of that SSN to a random SSN exists.
    # Otherwise create a new mapping.
    if ( -not $ssnMap.ContainsKey($matches[1]) ) {
      do {
        $rnd = Get-Random -Min 100000 -Max 999999
        $newSSN = "666-$($rnd -replace '(..)(....)','$1-$2')"
      } while ( $ssnMap.ContainsValue($newSSN) )  # loop to avoid collisions
      $ssnMap[$matches[1]] = $newSSN

    # Replace the SSN with the corresponding randomly generated SSN.
    $_ -replace $matches[1], $ssnMap[$matches[1]]
  } else {
    # If no SSN is found, simply print the line.
} | Set-Content $filename

如果您已经有一个随机SSN列表,并且还将它们映射到特定的"真实" SSN,您可以将这些映射从CSV(示例列标题:realSSNrandomSSN)读取到$ssnMap哈希表中:

$ssnMap = @{}
Import-Csv 'C:\mappings.csv' | % { $ssnMap[$_.realSSN] = $_.randomSSN }

答案 2 :(得分:0)


$inputfile = 'C:\TrainingFile\TrainingFile.txt'
$outputfile = 'C:\TrainingFile\NewTrainingFile.txt'

$replacements = Get-Content 'C:\TrainingFile\SSN_Replacements.txt'


Filter Replace-SSN { $_ -replace '\d{3}-\d{2}-\d{4}',$replacements[$i++] }

Get-Content $inputfile |
Replace-SSN |
Set-Content $outputfile




$inputfile = 'C:\TrainingFile\TrainingFile.txt'
$outputfile = 'C:\TrainingFile\NewTrainingFile.txt'
$replacementfile = 'C:\TrainingFile\SSN_Replacements.csv' 

$SSNmatch = [regex]'\d{3}-\d{2}-\d{4}'

$replacements = @{}

Import-Csv $replacementfile |
 ForEach-Object { $replacements[$_.OldSSN] = $_.NewSSN }

Get-Content $inputfile -ReadCount 1000|

 ForEach-Object {
  foreach ($Line in $_){
  if ( $Line -match $SSNmatch ) #Found SSN in line
    { if ( $replacements.ContainsKey($matches[0]) ) #Found replacement string for this SSN
        { $Line -replace $SSNmatch,$replacements[$matches[0]] } #Replace SSN and ouput line

      else {Write-Warning "Warning - no replacement string found for $($matches[0])"


  else { $Line } #No SSN in this line - output line as-is
} | Set-Content $outputfile

答案 3 :(得分:-1)

# Fairly fast PowerShell code for masking up to 1000 SSN number per line in a large text file (with unlimited # of lines in the file) where the SSN matches the pattern of " ###-##-#### ", " ##-####### ", or " ######### ".
# This code can handle a 14 MB text file that has SSN numbers in nearly every row within about 4 minutes.

# $inputFilename = 'C:/InputFile.txt'

$inputFileName = "
           0550       125665    338066                                                                                               
-                   02 CR05635                                  07/06/16                                                             
0     SAMPLE CUSTOMER NAME                                                                                                   
      PO BOX 12345                                                                                                                  
      ROSEVILLE CA 12345-9109                                                                                                        

 EMPLOYEE DEFERRALS                                                                                        
 FREDDIE MAC RO 16 9385456   164-44-9120     XXX                                                                               
 SALLY MAE RO 95 9385356   07-4719130     XXX                                                                               
 FRED FLINTSTONE RO 95 1185456   061741130     XXX  
 WILMA FLINTSTONE RO 91 9235456   364-74-9130  123456789 123456389 987354321    XXX                                                          
 PEBBLES RUBBLE RO 10 9235456 06-3749130  064-74-9150  034-74-9130  XXX                                                                               
 BARNEY RUBBLE RO 11 9235456 06-3449130 06-3749140 063-74-9130     XXX                                                                               
 BETTY RUBBLE RO 16 9235456   9-74-9140  123456789 123456789 987654321    XXX                                                                               

 PLEASE ENTER BELOW ANY ADDITIONAL PARTICIPANTS FOR WHOM YOU ARE                                                                     
 REMITTING.  FOR GENERAL INFORMATION AND SERVICE CALL                                                                              

$outputFilename = 'D:/OutFile.txt'

#(Get-Content $inputFilename ) | % {

($inputFilename ) | % {

       # Write-Host "0 new line value is ($NewLine)."


       While (($ChangeFound -eq 'Y') -and ($WhileCounter -lt 1000))

       $matches = $NewLine | Select-String -pattern "[ ][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9][ |\t|\r|\n]" -AllMatches
       If ($matches.length -gt 0)
          for($i = 0; $i -lt 1; $i++){
              for($k = 0; $k -lt 1; $k++){
                  # Write-Host "AmHere 1a `$i ($i), `$k ($k), `$NewLine ($NewLine)."
                  $t = $matches[$i] -replace $matches[$i].matches[$k].value, (" ###-##-" + $matches[$i].matches[$k].value.substring(8) )
                  $NewLine=$NewLine + $t
                  # Write-Host "AmHere 1b `$i ($i), `$k ($k), `$NewLine ($NewLine)."

          # Write-Host "1 new line value is ($NewLine)."
       $matches = $NewLine | Select-String -pattern "[ ][0-9][0-9]-[0-9][0-9][0-9][0-9][0-9][0-9][0-9][ |\t|\r|\n]" -AllMatches
       If ($matches.length -gt 0)
          for($i = 0; $i -lt 1; $i++){
              for($k = 0; $k -lt 1; $k++){
                  # Write-Host "AmHere 2a `$i ($i), `$k ($k), `$NewLine ($NewLine)."
                  $t = $matches[$i] -replace $matches[$i].matches[$k].value, (" ##-###" + $matches[$i].matches[$k].value.substring(7) )
                  $NewLine=$NewLine + $t
                  # Write-Host "AmHere 2b `$i ($i), `$k ($k), `$NewLine ($NewLine)."
          # Write-Host "2 new line value is ($NewLine)."
       $matches = $NewLine | Select-String -pattern "[ ][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][ |\t|\r|\n]" -AllMatches
       If ($matches.length -gt 0)
          for($i = 0; $i -lt 1; $i++){
              for($k = 0; $k -lt 1; $k++){
                  # Write-Host "AmHere 3a `$i ($i), `$k ($k), `$NewLine ($NewLine)."
                  $t = $matches[$i] -replace $matches[$i].matches[$k].value, (" #####" + $matches[$i].matches[$k].value.substring(6) )
                  $NewLine=$NewLine + $t
                  # Write-Host "AmHere 3b `$i ($i), `$k ($k), `$NewLine ($NewLine)."
          #print the line
          # Write-Host "3 new line value is ($NewLine)."
       # Write-Host "4 new line value is ($NewLine)."

       } # end of DoWhile
       Write-Host "5 new line value is ($NewLine)."


    # Replace the SSN with the corresponding randomly generated SSN.
    # $_ -replace $matches[1], $ssnMap[$matches[1]]
 } | Set-Content $outputFilename