将OUTLOOK.HOL文件解析为CSV

时间:2020-11-10 20:25:34

标签: powershell csv parsing structure

OUTLOOK.HOL(假日)文件结构如下:

[Portugal] 207
All Saints' Day,2021/11/1
All Saints' Day,2022/11/1
Assumption,2021/8/15
Assumption,2022/8/15
Carnival,2021/2/16
Carnival,2022/3/1

[Puerto Rico] 489
Birthday of Eugenio María de Hostos,2021/1/11
Birthday of Eugenio María de Hostos,2022/1/10
Birthday of José de Diego,2021/4/19
Birthday of José de Diego,2022/4/18
Birthday of Don Luis Muñoz Rivera,2021/7/19
Birthday of Don Luis Muñoz Rivera,2022/7/18

[Qatar] 118
...

如何使用PowerShell将文件解析为结构化数据,以将CSV文件添加到带有标题的文件中:

国家/地区;号码;假日名称;日期

/米哈尔

1 个答案:

答案 0 :(得分:0)

您需要一个接一个地循环浏览文件中的所有行,并使用正则表达式解析不同的“字段”。

$result = switch -Regex -File 'D:\Test\outlook.hol' {
    '^\[([^\]]+)\]\s+(\d+)' { 
        $country = $matches[1]
        $number = $matches[2]
    }
    '^([^,]+),(\d{4}/\d{1,2}/\d{1,2})$' { 
        # found a data line, output a PSObject
        [PsCustomObject]@{
            Country      = $country
            Number       = $number
            Holiday_name = $matches[1]
            Date         = $matches[2]
        }
    }
}

# output on screen
$result | Format-Table -AutoSize

# output to CSV file
$result | Export-Csv -Path 'D:\Test\OutlookHolidays.csv' -NoTypeInformation -Encoding UTF8

输出(在屏幕上)

Country     Number Holiday_name                        Date     
-------     ------ ------------                        ----     
Portugal    207    All Saints' Day                     2021/11/1
Portugal    207    All Saints' Day                     2022/11/1
Portugal    207    Assumption                          2021/8/15
Portugal    207    Assumption                          2022/8/15
Portugal    207    Carnival                            2021/2/16
Portugal    207    Carnival                            2022/3/1 
Puerto Rico 489    Birthday of Eugenio María de Hostos 2021/1/11
Puerto Rico 489    Birthday of Eugenio María de Hostos 2022/1/10
Puerto Rico 489    Birthday of José de Diego           2021/4/19
Puerto Rico 489    Birthday of José de Diego           2022/4/18
Puerto Rico 489    Birthday of Don Luis Muñoz Rivera   2021/7/19
Puerto Rico 489    Birthday of Don Luis Muñoz Rivera   2022/7/18

正则表达式1的详细信息:

^                  Assert position at the beginning of the string
\[                 Match the character “[” literally
(                  Match the regular expression below and capture its match into backreference number 1
   [^\]]           Match any character that is NOT a “A ] character”
      +            Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)                 
\]                 Match the character “]” literally
\s                 Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
   +               Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(                  Match the regular expression below and capture its match into backreference number 2
   \d              Match a single digit 0..9
      +            Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)

Regex 2详细信息:

^                 Assert position at the beginning of the string
(                 Match the regular expression below and capture its match into backreference number 1
   [^,]           Match any character that is NOT a “,”
      +           Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)                
,                 Match the character “,” literally
(                 Match the regular expression below and capture its match into backreference number 2
   \d             Match a single digit 0..9
      {4}         Exactly 4 times
   /              Match the character “/” literally
   \d             Match a single digit 0..9
      {1,2}       Between one and 2 times, as many times as possible, giving back as needed (greedy)
   /              Match the character “/” literally
   \d             Match a single digit 0..9
      {1,2}       Between one and 2 times, as many times as possible, giving back as needed (greedy)
)                
$                 Assert position at the end of the string (or before the line break at the end of the string, if any)