PowerShell + Regex匹配并输出到csv

时间:2016-02-26 20:24:30

标签: regex csv powershell

我对PowerShell和RegEx非常陌生,并且非常乐意为您提供帮助。我有一个文件," print.txt"它有大约35,000行。我被要求找到一种方法将其转换为CSV以便在Excel中进一步操作。

不幸的是我无法控制print.txt的格式,所以我按原样坚持了。

来自print.txt的样本:

---------  #1157 11/06/2015 09:44:21  Total: 2482.3  ---------
RCPE:  101 ID: 204 WKOD:    0 OPRT:    0 TARE: 13.6
MAT      ADDI(2) REGR(4) ADDI(5) ADDI(6) NATU(8)
              2%     25%    0.5%    1.3%    100 
FINA R     1.89   25.36    0.54    1.31  100.00
FINA W     33.7   629.4     9.6    23.3  1786.1
1st DW     22.8   629.4     9.6    23.3  1786.1
1st DT     79.0  1578.0  3622.0  9753.0  8468.0
1st FR   449.37  396.19    2.47    2.38  212.82
 DW/DT   288.40  398.88    2.66    2.39  210.93
FRate    449.37  396.19    2.57    2.38  211.87
Retry#     02                                      

---------  #1158 11/06/2015 09:45:40  Total: 2513.7  ---------
RCPE:  101 ID: 204 WKOD:    0 OPRT:    0 TARE: 12.4
MAT      ADDI(2) REGR(4) ADDI(5) ADDI(6) NATU(8)
              2%     25%    0.5%    1.3%    100 
FINA R     1.81   25.48    0.49    1.28  100.00
FINA W     32.8   640.4     8.8    23.2  1808.4
1st DW     21.1   640.4     8.8    23.2  1705.8
1st DT     80.0  1578.0  3524.0  9875.0  8456.0
1st FR   449.37  396.19    2.57    2.38  211.87
 DW/DT   263.20  405.85    2.51    2.35  201.73
FRate    449.37  396.19    2.57    2.38  206.80
Retry#     01                                 01  

---------  #1159 11/06/2015 09:46:43  Total: 2484.9  ---------
RCPE:  101 ID: 204 WKOD:    0 OPRT:    0 TARE: 12.3
MAT      ADDI(2) REGR(4) ADDI(5) ADDI(6) NATU(8)
              2%     25%    0.5%    1.3%    100 
FINA R     1.83   25.36    0.51    1.26  100.00
FINA W     32.8   630.2     9.1    22.6  1790.2
1st DW     24.3   630.2     9.1    22.6  1790.2
1st DT     80.0  1578.0  3489.0  9775.0  8710.0
1st FR   449.37  396.19    2.57    2.38  206.80
 DW/DT   303.24  399.39    2.60    2.31  205.53
FRate    449.37  396.19    2.57    2.38  206.80
Retry#     01                                      

---------  #1160 11/06/2015 09:47:58  Total: 2581.8  ---------
RCPE:  101 ID: 204 WKOD:    0 OPRT:    0 TARE: 12.7
MAT      ADDI(2) REGR(4) ADDI(5) ADDI(6) NATU(8)
              2%     25%    0.5%    1.3%    100 
FINA R     1.91   25.06    0.49    1.30  100.00
FINA W     35.6   646.9     9.1    24.3  1865.9
1st DW     23.8   646.9     7.5    24.3  1865.9
1st DT     83.0  1578.0  3636.0 10188.0  8633.0
1st FR   449.37  396.19    2.57    2.38  206.80
 DW/DT   287.02  409.98    2.07    2.38  216.13
FRate    449.37  396.19    2.32    2.38  211.47
Retry#     02               01                    

---------  #1161 11/06/2015 09:49:01  Total: 2645.1  ---------
RCPE:  101 ID: 204 WKOD:    0 OPRT:    0 TARE: 12.3
MAT      ADDI(2) REGR(4) ADDI(5) ADDI(6) NATU(8)
              2%     25%    0.5%    1.3%    100 
FINA R     1.87   24.36    0.52    1.34  100.00
FINA W     36.1   644.3    10.1    25.9  1928.8
1st DW     24.8   644.3    10.1    25.9  1928.8
1st DT     86.0  1578.0  4159.0 10532.0  8454.0
1st FR   449.37  396.19    2.32    2.38  211.47
 DW/DT   288.18  408.28    2.43    2.46  228.15
FRate    449.37  396.19    2.32    2.42  219.81
Retry#     02                                      

我需要一个脚本,最好是PowerShell,解析print.txt文件并将其输出到output.csv。

示例output.csv(手动创建标题行):

Cycle #,Date,Time,Total Cycle Weight,RCPE,WSB ID #,WKOD #,Op #,TARE,MAT: ADDI(2),MAT: REGR(4),MAT: ADDI(5),MAT: ADDI(6),MAT: NATU(8),FINA R: ADDI(2),FINA R: REGR(4),FINA R: ADDI(5),FINA R: ADDI(6),FINA R: NATU(8),FINA W: ADDI(2),FINA W: REGR(4),FINA W: ADDI(5),FINA W: ADDI(6),FINA W: NATU(8),1st DW: ADDI(2),1st DW: REGR(4),1st DW: ADDI(5),1st DW: ADDI(6),1st DW: NATU(8),1st DT: ADDI(2),1st DT: REGR(4),1st DT: ADDI(5),1st DT: ADDI(6),1st DT: NATU(8),1st FR: ADDI(2),1st FR: REGR(4),1st FR: ADDI(5),1st FR: ADDI(6),1st FR: NATU(8),DW/DT: ADDI(2),DW/DT: REGR(4),DW/DT: ADDI(5),DW/DT: ADDI(6),DW/DT: NATU(8),FRate: ADDI(2),FRate: REGR(4),FRate: ADDI(5),FRate: ADDI(6),FRate: NATU(8),Retry#: ADDI(2),Retry#: REGR(4),Retry#: ADDI(5),Retry#: ADDI(6),Retry#: NATU(8)
1157,2015-11-06,09:44:21,2482.3,101,204,0,0,13.6,2%,25%,0.50%,1.30%,100,1.89,25.36,0.54,1.31,100.00,33.70,629.40,9.60,23.30,1786.10,22.80,629.40,9.60,23.30,1786.10,79.00,1578.00,3622.00,9753.00,8468.00,449.37,396.19,2.47,2.38,212.82,288.40,398.88,2.66,2.39,210.93,449.37,396.19,2.57,2.38,211.87,02,,,,
1158,2015-11-06,09:45:40,2513.7,101,204,0,0,12.4,2%,25%,0.50%,1.30%,100,1.81,25.48,0.49,1.28,100.00,32.80,640.40,8.80,23.20,1808.40,21.10,640.40,8.80,23.20,1705.80,80.00,1578.00,3524.00,9875.00,8456.00,449.37,396.19,2.57,2.38,211.87,263.20,405.85,2.51,2.35,201.73,449.37,396.19,2.57,2.38,206.80,01,,,,01
1159,2015-11-06,09:46:43,2484.9,101,204,0,0,12.3,2%,25%,0.50%,1.30%,100,1.83,25.36,0.51,1.26,100.00,32.80,630.20,9.10,22.60,1790.20,24.30,630.20,9.10,22.60,1790.20,80.00,1578.00,3489.00,9775.00,8710.00,449.37,396.19,2.57,2.38,206.80,303.24,399.39,2.60,2.31,205.53,449.37,396.19,2.57,2.38,206.80,01,,,,
1160,2015-11-06,09:47:58,2581.8,101,204,0,0,12.7,2%,25%,0.50%,1.30%,100,1.91,25.06,0.49,1.30,100.00,35.60,646.90,9.10,24.30,1865.90,23.80,646.90,7.50,24.30,1865.90,83.00,1578.00,3636.00,10188.00,8633.00,449.37,396.19,2.57,2.38,206.80,287.02,409.98,2.07,2.38,216.13,449.37,396.19,2.32,2.38,211.47,02,,01,,
1161,2015-11-06,09:49:01,2645.1,101,204,0,0,12.3,2%,25%,0.50%,1.30%,100,1.87,24.36,0.52,1.34,100.00,36.10,644.30,10.10,25.90,1928.80,24.80,644.30,10.10,25.90,1928.80,86.00,1578.00,4159.00,10532.00,8454.00,449.37,396.19,2.32,2.38,211.47,288.18,408.28,2.43,2.46,228.15,449.37,396.19,2.32,2.42,219.81,02,,,,

有人会关心这个吗?我已经在这里阅读了许多类似的请求,但在我自己的实现中没有多少运气。

2 个答案:

答案 0 :(得分:1)

只是一个开始,但你看到这导致的地方:)以下的正则表达式匹配前两行(在自由间隔模式下)

\#(?P<cycle>\d+)\s
  (?P<date>[\d/]+)\s
  (?P<time>[\d:]+)\s+
  Total:\s(?P<total>[\d.]+)[-\s]+
  RCPE:\s+(?P<rcpe>\d+)\s
  ID:\s(?P<id>\d+)\s
  WKOD:\s+(?P<wkod>\d+)\s
  OPRT:\s+(?P<oprt>\d+)\s
  TARE:\s(?P<tare>[.\d]+)

之后,你只需将各个部分粘在一起。见a demo on regex101.com。除此之外,@ Bacon Bits可能是对的 - 您可能更适合搜索自由职业者。

答案 1 :(得分:0)

我可能会使用正则表达式,但这里有一些替代方案。

如果您有Powershell 5.0,可以尝试使用易于创建的模板ConvertFrom-String

$Template = @'
---------  #{Cycle*:1157} {Date:11/06/2015} {Time:09:44:21}  Total: {TotalCycleWeight:2482.3}  ---------
RCPE:  {RCPE:101} ID: {WSBID:204} WKOD:    {WKOD:0} OPRT:    {Op:0} TARE: {TARE:13.6}
MAT      ADDI(2) REGR(4) ADDI(5) ADDI(6) NATU(8)
              {MAT_ADDI2:2%}     {MAT_REGR4:25%}    {MAT_ADDI5:0.5%}    {MAT_ADDI6:1.3%}    {MAT_NATU8:100}
FINA R     {FINAR_ADDI2:1.89}   {FINAR_REGR4:25.36}    {FINAR_ADDI5:0.54}    {FINAR_ADDI6:1.31}  {FINAR_NATU8:100.00}
FINA W     {FINAW_ADDI2:33.7}   {FINAW_REGR4:629.4}     {FINAW_ADDI5:9.6}    {FINAW_ADDI6:23.3}  {FINAW_NATU8:1786.1}
1st DW     {DW1st_ADDI2:22.8}   {DW1st_REGR4:629.4}     {DW1st_ADDI5:9.6}    {DW1st_ADDI6:23.3}  {DW1st_NATU8:1786.1}
1st DT     {DT1st_ADDI2:79.0}  {DT1st_REGR4:1578.0}  {DT1st_ADDI5:3622.0}  {DT1st_ADDI6:9753.0}  {DT1st_NATU8:8468.0}
1st FR   {FR1st_ADDI2:449.37}  {FR1st_REGR4:396.19}    {FR1st_ADDI5:2.47}    {FR1st_ADDI6:2.38}  {FR1st_NATU8:212.82}
 DW/DT   {DWDT_ADDI2:288.40}  {DWDT_REGR4:398.88}    {DWDT_ADDI5:2.66}    {DWDT_ADDI6:2.39}  {DWDT_NATU8:210.93}
FRate    {FRate_ADDI2:449.37}  {FRate_REGR4:396.19}    {FRate_ADDI5:2.57}    {FRate_ADDI6:2.38}  {FRate_NATU8:211.87}
Retry#     {Retry_ADDI2:02}   {Retry_REG4:0}  {Retry_ADDI5:0}   {Retry_ADDI6:0}    {Retry_NATU8:0}                                   

---------  #{Cycle*:1145} {Date:11/06/2015} {Time:09:44:21}  Total: {TotalCycleWeight:2482.3}  ---------
RCPE:  {RCPE:101} ID: {WSBID:204} WKOD:    {WKOD:0} OPRT:    {Op:0} TARE: {TARE:13.6}
MAT      ADDI(2) REGR(4) ADDI(5) ADDI(6) NATU(8)
              {MAT_ADDI2:2%}     {MAT_REGR4:25%}    {MAT_ADDI5:0.5%}    {MAT_ADDI6:1.3%}    {MAT_NATU8:100}
'@

Get-Content .\Test.txt | ConvertFrom-String -TemplateContent $Template

但是,我在使用Retry行时遇到了问题,因为它在每个字段中都没有值。如果你设法解决这个问题,那么如果你对写正则表达式感到不舒服,这是一个很好的选择。

您还可以使用拆分和大量硬编码值。

(Get-Content .\Test.txt -Raw) -split '[\n\r]{4}' | % {
    $data = $_ -split "\n"

    $Cycle,$Date,$Time,$TotalCycleWeight = @($data[0] -replace '-+\s+|#|Total:' -split '\s+')[0..3]
    $RCPE,$WSBID,$WKOD,$Op,$TARE = @($data[1] -replace '\w+:\s+' -split '\s+')[0..4]
    $MAT_ADDI2,$MAT_REGR4,$MAT_ADDI5,$MAT_ADDI6,$MAT_NATU8 = @($data[3] -split '\s+')[1..5]
    $FINAR_ADDI2,$FINAR_REGR4,$FINAR_ADDI5,$FINAR_ADDI6,$FINAR_NATU8 = @($data[4] -split '\s+')[2..6]
    $FINAW_ADDI2,$FINAW_REGR4,$FINAW_ADDI5,$FINAW_ADDI6,$FINAW_NATU8 = @($data[5] -split '\s+')[2..6]
    $1stDW_ADDI2,$1stDW_REGR4,$1stDW_ADDI5,$1stDW_ADDI6,$1stDW_NATU8 = @($data[6] -split '\s+')[2..6]
    $1stDT_ADDI2,$1stDT_REGR4,$1stDT_ADDI5,$1stDT_ADDI6,$1stDT_NATU8 = @($data[7] -split '\s+')[2..6]
    $1stFR_ADDI2,$1stFR_REGR4,$1stFR_ADDI5,$1stFR_ADDI6,$1stFR_NATU8 = @($data[8] -split '\s+')[2..6]
    $DWDT_ADDI2,$DWDT_REGR4,$DWDT_ADDI5,$DWDT_ADDI6,$DWDT_NATU8 = @($data[9] -split '\s+')[2..6]
    $FRate_ADDI2,$FRate_REGR4,$FRate_ADDI5,$FRate_ADDI6,$FRate_NATU8 = @($data[10] -split '\s+')[1..5]
    $Retry_ADDI2 = $data[11].Substring(9,5).Trim() | ? { $_ }
    $Retry_REGR4 = $data[11].Substring(15,9).Trim() | ? { $_ }
    $Retry_ADDI5 = $data[11].Substring(25,9).Trim() | ? { $_ }
    $Retry_ADDI6 = $data[11].Substring(35,9).Trim() | ? { $_ }
    $Retry_NATU8 = $data[11].Substring(45,$data[11].Length-45).Trim() | ? { $_ }

    New-Object -TypeName psobject -Property @{
        'Cycle #' = $Cycle
        'Date' = $Date
        'Time' = $Time
        'Total Cycle Weight' = $TotalCycleWeight
        'RCPE' = $RCPE
        'WSB ID #' = $WSBID
        'WKOD #' = $WKOD
        'Op #' = $Op
        'TARE' = $TARE
        'MAT: ADDI(2)' = $MAT_ADDI2
        'MAT: REGR(4)' = $MAT_REGR4
        'MAT: ADDI(5)' = $MAT_ADDI5
        'MAT: ADDI(6)' = $MAT_ADDI6
        'MAT: NATU(8)' = $MAT_NATU8
        'FINA R: ADDI(2)' = $FINAR_ADDI2
        'FINA R: REGR(4)' = $FINAR_REGR4
        'FINA R: ADDI(5)' = $FINAR_ADDI5
        'FINA R: ADDI(6)' = $FINAR_ADDI6
        'FINA R: NATU(8)' = $FINAR_NATU8
        'FINA W: ADDI(2)' = $FINAW_ADDI2
        'FINA W: REGR(4)' = $FINAW_REGR4
        'FINA W: ADDI(5)' = $FINAW_ADDI5
        'FINA W: ADDI(6)' = $FINAW_ADDI6
        'FINA W: NATU(8)' = $FINAW_NATU8
        '1st DW: ADDI(2)' = $1stDW_ADDI2
        '1st DW: REGR(4)' = $1stDW_REGR4
        '1st DW: ADDI(5)' = $1stDW_ADDI5
        '1st DW: ADDI(6)' = $1stDW_ADDI6
        '1st DW: NATU(8)' = $1stDW_NATU8
        '1st DT: ADDI(2)' = $1stDT_ADDI2
        '1st DT: REGR(4)' = $1stDT_REGR4
        '1st DT: ADDI(5)' = $1stDT_ADDI5
        '1st DT: ADDI(6)' = $1stDT_ADDI6
        '1st DT: NATU(8)' = $1stDT_NATU8
        '1st FR: ADDI(2)' = $1stFR_ADDI2
        '1st FR: REGR(4)' = $1stFR_REGR4
        '1st FR: ADDI(5)' = $1stFR_ADDI5
        '1st FR: ADDI(6)' = $1stFR_ADDI6
        '1st FR: NATU(8)' = $1stFR_NATU8
        'DW/DT: ADDI(2)' = $DWDT_ADDI2
        'DW/DT: REGR(4)' = $DWDT_REGR4
        'DW/DT: ADDI(5)' = $DWDT_ADDI5
        'DW/DT: ADDI(6)' = $DWDT_ADDI6
        'DW/DT: NATU(8)' = $DWDT_NATU8
        'FRate: ADDI(2)' = $FRate_ADDI2
        'FRate: REGR(4)' = $FRate_REGR4
        'FRate: ADDI(5)' = $FRate_ADDI5
        'FRate: ADDI(6)' = $FRate_ADDI6
        'FRate: NATU(8)' = $FRate_NATU8
        'Retry#: ADDI(2)' = $Retry_ADDI2
        'Retry#: REGR(4)' = $Retry_REGR4
        'Retry#: ADDI(5)' = $Retry_ADDI5
        'Retry#: ADDI(6)' = $Retry_ADDI6
        'Retry#: NATU(8)' = $Retry_NATU8
    }
} | Select-Object 'Cycle #','Date','Time','Total Cycle Weight','RCPE','WSB ID #','WKOD #','Op #','TARE','MAT: ADDI(2)','MAT: REGR(4)','MAT: ADDI(5)','MAT: ADDI(6)','MAT: NATU(8)','FINA R: ADDI(2)','FINA R: REGR(4)','FINA R: ADDI(5)','FINA R: ADDI(6)','FINA R: NATU(8)','FINA W: ADDI(2)','FINA W: REGR(4)','FINA W: ADDI(5)','FINA W: ADDI(6)','FINA W: NATU(8)','1st DW: ADDI(2)','1st DW: REGR(4)','1st DW: ADDI(5)','1st DW: ADDI(6)','1st DW: NATU(8)','1st DT: ADDI(2)','1st DT: REGR(4)','1st DT: ADDI(5)','1st DT: ADDI(6)','1st DT: NATU(8)','1st FR: ADDI(2)','1st FR: REGR(4)','1st FR: ADDI(5)','1st FR: ADDI(6)','1st FR: NATU(8)','DW/DT: ADDI(2)','DW/DT: REGR(4)','DW/DT: ADDI(5)','DW/DT: ADDI(6)','DW/DT: NATU(8)','FRate: ADDI(2)','FRate: REGR(4)','FRate: ADDI(5)','FRate: ADDI(6)','FRate: NATU(8)','Retry#: ADDI(2)','Retry#: REGR(4)','Retry#: ADDI(5)','Retry#: ADDI(6)','Retry#: NATU(8)' | Export-Csv .\Test.csv -NoTypeInformation