来自文本文件的数据

时间:2019-07-11 12:13:39

标签: powershell

我有一个文本文件toto.txt,内容如下:

     Time: 11/23/2018 17:03:46
     User: NEON
     Web Site: https://www.seznam.cz
     Top

     Time: 11/23/2018 17:05:10
     User: NEON
     Web Site: www.autojournal.cz%252Fstat-prodava-zabavena-auta-padouchu-budou-levnejsi-nez-jine-ojetiny-2%252F/keFrdPDIZzLJBC2fxX7EIQ?utm_source=www.seznam.cz&utm_medium=sekce-z-internetu
      Top

 Time: 11/23/2018 17:05:11
 User: NEON
 Web Site: www.autojournal.cz/stat-prodava-zabavena-auta-padouchu-budou-levnejsi-nez-jine-ojetiny-2/?utm_source=www.seznam.cz&utm_medium=sekce-z-internetu
  Top
 ... etc. ...

导出数据的代码

 ((Get-Content C:\Users\user\Desktop\test\toto.txt -RAW) -split '\n(?=Time:)') | % {
     $x = $_ -split '\r'
     New-Object PSOBJECT -Property @{
         Time  = [regex]::Match($x[0],'(?<=Time:\s*)\b.*\b')
         User = [regex]::Match($x[1],'(?<=User:\s*)\b.*\b')
         Web = [regex]::Match($x[2],'(?<=Site:\s*)\b.*\b')
     }
 } | out-file  C:\Users\user\Desktop\test\result.txt

问题在于,result.txt中没有长网址(网站)。

我需要result.txt的结构:

datetime; $ url例如:2019-01-15  15:06:03; $ www.autojournal.cz / stat-prodava-zabavena-auta-padouchu-budou-levnejsi-nez-jine-ojetiny-2 /?utm_source = www.seznam.cz&utm_medium = sekce-z-internetu < / p>

在result.txt中,我得到:11/23/2018 17:05:10 NEON  www.autojournal.cz%252Fstat-prodava-zabavena-auta-padouchu-budou-levnejsi-nez-jine-ojetiny-2%25 ...

我可以转换的日期时间:

 (Get-Content C:\Users\user\Desktop\test\result.txt) | 
 Foreach-Object {$_ -replace "([0-9]+)/+([0-9]+)/+([0-9]+)", '$3-$1-$2'} | 
 Foreach-Object {$_ -replace "([0-9]+):+([0-9]+):+([0-9]+)", '$1-$2-$3;$'} |
 Set-Content C:\Users\user\Desktop\test\result2.txt


((Get-Content C:\Users\user\Desktop\test\toto.txt -RAW) -split'\n(?=Time:)') | % {
 $x = $_ -split '\r'
 New-Object PSOBJECT -Property @{
     Time  = [regex]::Match($x[0],'(?<=Time:\s*)\b.*\b')
     User = [regex]::Match($x[1],'(?<=User:\s*)\b.*\b')
     Web = [regex]::Match($x[2],'(?<=Site:\s*)\b.*\b')
 } } | out-file  C:\Users\user\Desktop\test\result.txt

 (Get-Content C:\Users\user\Desktop\test\result.txt) |  Foreach-Object {$_ -replace "([0-9]+)/+([0-9]+)/+([0-9]+)", '$3-$1-$2'} | Foreach-Object {$_ -replace "([0-9]+):+([0-9]+):+([0-9]+)", '$1-$2-$3;$'} | Set-Content C:\Users\user\Desktop\test\result2.txt

1 个答案:

答案 0 :(得分:0)

输出文件具有“宽度”参数。您可以使用它来阻止它缩短线段

((Get-Content C:\Users\user\Desktop\test\toto.txt -RAW) -split '\n(?=Time:)') | % {
    $x = $_ -split '\r'
    New-Object PSOBJECT -Property @{
        Time  = [regex]::Match($x[0],'(?<=Time:\s*)\b.*\b')
        User = [regex]::Match($x[1],'(?<=User:\s*)\b.*\b')
        Web = [regex]::Match($x[2],'(?<=Site:\s*)\b.*\b')
    }
} | out-file  C:\Users\user\Desktop\test\result.txt -Width 10000

您还应该考虑使用Import-Csv,Export-Csv和[PSCustomObjects]处理CSV文件。比分开txt文件更容易。

((Get-Content C:\Users\user\Desktop\test\toto.txt -RAW) -split '\n(?=Time:)') | % {
    $x = $_ -split '\r'
    New-Object PSOBJECT -Property @{
        Time  = [regex]::Match($x[0],'(?<=Time:\s*)\b.*\b')
        User = [regex]::Match($x[1],'(?<=User:\s*)\b.*\b')
        Web = [regex]::Match($x[2],'(?<=Site:\s*)\b.*\b')
    }
} | Export-Csv C:\Users\user\Desktop\test\result.txt -Delimiter ";" -NoTypeInformation