OUTLOOK.HOL(假日)文件结构如下:
[Portugal] 207
All Saints' Day,2021/11/1
All Saints' Day,2022/11/1
Assumption,2021/8/15
Assumption,2022/8/15
Carnival,2021/2/16
Carnival,2022/3/1
[Puerto Rico] 489
Birthday of Eugenio María de Hostos,2021/1/11
Birthday of Eugenio María de Hostos,2022/1/10
Birthday of José de Diego,2021/4/19
Birthday of José de Diego,2022/4/18
Birthday of Don Luis Muñoz Rivera,2021/7/19
Birthday of Don Luis Muñoz Rivera,2022/7/18
[Qatar] 118
...
如何使用PowerShell将文件解析为结构化数据,以将CSV文件添加到带有标题的文件中:
国家/地区;号码;假日名称;日期
/米哈尔
答案 0 :(得分:0)
您需要一个接一个地循环浏览文件中的所有行,并使用正则表达式解析不同的“字段”。
$result = switch -Regex -File 'D:\Test\outlook.hol' {
'^\[([^\]]+)\]\s+(\d+)' {
$country = $matches[1]
$number = $matches[2]
}
'^([^,]+),(\d{4}/\d{1,2}/\d{1,2})$' {
# found a data line, output a PSObject
[PsCustomObject]@{
Country = $country
Number = $number
Holiday_name = $matches[1]
Date = $matches[2]
}
}
}
# output on screen
$result | Format-Table -AutoSize
# output to CSV file
$result | Export-Csv -Path 'D:\Test\OutlookHolidays.csv' -NoTypeInformation -Encoding UTF8
输出(在屏幕上)
Country Number Holiday_name Date
------- ------ ------------ ----
Portugal 207 All Saints' Day 2021/11/1
Portugal 207 All Saints' Day 2022/11/1
Portugal 207 Assumption 2021/8/15
Portugal 207 Assumption 2022/8/15
Portugal 207 Carnival 2021/2/16
Portugal 207 Carnival 2022/3/1
Puerto Rico 489 Birthday of Eugenio María de Hostos 2021/1/11
Puerto Rico 489 Birthday of Eugenio María de Hostos 2022/1/10
Puerto Rico 489 Birthday of José de Diego 2021/4/19
Puerto Rico 489 Birthday of José de Diego 2022/4/18
Puerto Rico 489 Birthday of Don Luis Muñoz Rivera 2021/7/19
Puerto Rico 489 Birthday of Don Luis Muñoz Rivera 2022/7/18
正则表达式1的详细信息:
^ Assert position at the beginning of the string
\[ Match the character “[” literally
( Match the regular expression below and capture its match into backreference number 1
[^\]] Match any character that is NOT a “A ] character”
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\] Match the character “]” literally
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
( Match the regular expression below and capture its match into backreference number 2
\d Match a single digit 0..9
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
Regex 2详细信息:
^ Assert position at the beginning of the string
( Match the regular expression below and capture its match into backreference number 1
[^,] Match any character that is NOT a “,”
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
, Match the character “,” literally
( Match the regular expression below and capture its match into backreference number 2
\d Match a single digit 0..9
{4} Exactly 4 times
/ Match the character “/” literally
\d Match a single digit 0..9
{1,2} Between one and 2 times, as many times as possible, giving back as needed (greedy)
/ Match the character “/” literally
\d Match a single digit 0..9
{1,2} Between one and 2 times, as many times as possible, giving back as needed (greedy)
)
$ Assert position at the end of the string (or before the line break at the end of the string, if any)