Powershell脚本用于在关键字之间提取文本

时间:2015-06-25 21:01:20

标签: powershell

我希望从txt文件中提取数据并将其输出到其他txt文件。这是txt文件的内容

HAC 06: CATHETHER-ASSOCIATED URINARY TRACT INFECTION (UTI)
SECONDARY DIAGNOSIS

  T8351XA CC  Infection and inflammatory reaction due to indwelling urinary catheter, initial encounter

OR SECONDARY DIAGNOSIS

  B3741     Candidal cystitis and urethritis
  B3749     Other urogenital candidiasis
  N10   CC  Acute tubulo-interstitial nephritis
  N340  CC  Urethral abscess
  N390  CC  Urinary tract infection, site not specified

WITH SECONDARY DIAGNOSIS

  T8351XA CC  Infection and inflammatory reaction due to indwelling urinary catheter, initial encounter

HAC 07: VASCULAR CATHETHER-ASSOCIATED INFECTION
SECONDARY DIAGNOSIS

  T80211A CC  Bloodstream infection due to central venous catheter, initial encounter
  T80212A CC  Local infection due to central venous catheter, initial encounter
  T80218A CC  Other infection due to central venous catheter, initial encounter
  T80219A CC  Unspecified infection due to central venous catheter, initial encounter

HAC 08: SURGICAL SITE INFECTION-MEDIASTINITIS AFTER CORONARY BYPASS GRAFT (CABG)
PROCEDURES

  0210093 Bypass Coronary Artery, One Site from Coronary Artery with Autologous Venous Tissue, Open Approach
  0210098 Bypass Coronary Artery, One Site from Right Internal Mammary with Autologous Venous Tissue, Open Approach

我想将其提取为三个文件,用于HAC 06,HAC 07和HAC 08

下的内容

HAC 06将有

HAC 06: CATHETHER-ASSOCIATED URINARY TRACT INFECTION (UTI)
SECONDARY DIAGNOSIS

  T8351XA CC  Infection and inflammatory reaction due to indwelling urinary catheter, initial encounter

OR SECONDARY DIAGNOSIS

  B3741     Candidal cystitis and urethritis
  B3749     Other urogenital candidiasis
  N10   CC  Acute tubulo-interstitial nephritis
  N340  CC  Urethral abscess
  N390  CC  Urinary tract infection, site not specified

WITH SECONDARY DIAGNOSIS

  T8351XA CC  Infection and inflammatory reaction due to indwelling urinary catheter, initial encounter

HAC 07将拥有等等

HAC 07: VASCULAR CATHETHER-ASSOCIATED INFECTION
SECONDARY DIAGNOSIS

  T80211A CC  Bloodstream infection due to central venous catheter, initial encounter
  T80212A CC  Local infection due to central venous catheter, initial encounter
  T80218A CC  Other infection due to central venous catheter, initial encounter
  T80219A CC  Unspecified infection due to central venous catheter, initial encounter

我开始使用一些代码

$filename = "HAC.txt"
$output_file = "extract_$HAC06"

$extract = @()
select-string -path $filename -pattern "HAC" -context 0,1 |
    foreach-object {
    $extract += $_.line
    $extract += $_.context.postcontext
    }

$extract | out-file $output_file

但是我被困......任何帮助

1 个答案:

答案 0 :(得分:1)

您可以将所有文本导入为一个多行字符串,将其拆分为HAC行,然后根据第一行中列出的HAC编号导出每个文本。像这样:

$AllText = (Get-Content "HAC.txt") -join "`r`n"
$AllText -Split "(?=HAC \d)"| Where{$_ -match "^(HAC \d+)"} | ForEach{Set-Content -Value $_ -Path ($Matches[1]+'.txt')}

这将输出3个以HAC代码命名的文件,其中包含您正在寻找的内容。

编辑:好的,如果您想要修改文件的输出位置,我们可以添加如下路径:

$OutFolder = 'C:\Path\For\Output\'
$AllText = (Get-Content "HAC.txt") -join "`r`n"
$AllText -Split "(?=HAC \d)"| Where{$_ -match "^(HAC \d+)"} | ForEach{Set-Content -Value $_ -Path ($OutFolder + $Matches[1] + '.txt')}