我有一个html文件,其中包含
格式的链接<a href="http://www.google.com>Date: 25.02.2013 10:30 Name: Google</a><br>
我正在尝试编写一个powershell脚本来获取链接,日期,时间和名称,并将它们以CSV格式(链接,日期,时间,名称)放置
以下将给我链接,但不是其他信息,我只是遗漏了什么?正则表达式有效,但在寻找名称的名称中放弃“名称:”会很有帮助。
$input_path = 'C:\temp\myfile.html'
$output_file = 'C:\temp\myfile.csv'
$regex_link = '([a-zA-Z]{4})://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)’
$regex_date = '\d{2}\.\d{2}\.\d{4}'
$regex_time = '\d{2}:\d{2}'
$regex_name = 'Name:\s([\w]*)'
$myVar = select-string -Path $input_path -Pattern $regex_link, $regex_date, $regex_time, $regex_name -AllMatches| % { $_.Matches } | % { $_.Value }
$myVar
答案 0 :(得分:0)
这不是我猜的最干净的解决方案,但它适用于我的测试:
$input_path = 'C:\temp\myfile.html'
$output_file = 'C:\temp\myfile.csv'
(Get-Content $input_path) -match "href" | % {
$data = ($_ -replace '(?:.*)href="(.*?)">Date:\s*([\w\.]+)\s*([\w\:]+)\s*Name:\s*(.*)</a>(?:.*)' , '$1;$2;$3;$4').Split(";")
New-Object psobject -Property @{
"Link" = $data[0].Trim()
"Date" = $data[1].Trim()
"Time" = $data[2].Trim()
"Name" = $data[3].Trim()
}
} | Select-Object Link, Date, Time, Name | Export-Csv $output_file -NoTypeInformation
Myfile.html:
<html>
<body>
asdsanfkj
djaksl
sadjklas
<a href="http://www.google.com">Date: 25.02.2013 10:30 Name: Googledas adka kasjiw</a><br>
sadsadmdsa
<a href="http://www.google2.com">Date: 22.22.2222 20:20 Name: Google2asd addasd </a><br>
sajl
dasjdsa
asd
</body>
</html>
Myfile.csv:
"Link","Date","Time","Name"
"http://www.google.com","25.02.2013","10:30","Googledas adka kasjiw"
"http://www.google2.com","22.22.2222","20:20","Google2asd addasd"