我在Powershell中有一个函数可以获取文件的内容并将其分解为字段以放入CSV文件。我想知道是否有办法从链接中获取值并将其添加到发送到CSV文件的列中,同时保持链接列不变。
function Convert2CSV {
(Get-Content $input_path) -match "href" | % {
$data = ($_ -replace '(?:.*)href="(.*?)">Date:\s*([\w\.]+)\s*([\w\:]+)\s*Item:\s*(.*)</a>(?:.*)' , '$1;$2;$3;$4').Split(";")
New-Object psobject -Property @{
"Link" = $data[0]
"Date" = $data[1]
"Time" = $data[2]
"Item" = $data[3]
}
} #| Export-Csv $output_file -NoTypeInformation
}
我正在寻找的价值是
FeedDefault_.*?(&) or _Feed.*?(&)
我认为我可以在“Link”= $ data [0]部分添加某种if语句吗?
按要求输出样本。
Value in Link | Link | Date | Time | Item |
--------------------------------------------------------------------------------------------------------------------------------------------|
bluepebbles | http://www.domain.com/page.html?FeedDefault_bluepebbles&something | 2013-05-19 | 13:30 | Blue Pebbles |
--------------------------------------------------------------------------------------------------------------------------------------------|
redpebbles | http://www.domain.com/page.html?Feed_redpebbles&something | 2013-05-19 | 13:31 | Red Pebbles |
--------------------------------------------------------------------------------------------------------------------------------------------|
CSV格式化
Value in Link,Link,Date,Time,Item
"bluepebbles","http://www.domain.com/page.html?FeedDefault_bluepebbles&something","2013-05-19","13:30","Blue Pebbles"
"redpebbles","http://www.domain.com/page.html?Feed_redpebbles&something","2013-05-19","13:31","Red Pebbles"
进入
$input_path = 'f:\mockup\area51\files\link.html'
$output_file = 'f:\mockup\area51\files\db_csv.csv'
$tstampCulture = [Globalization.cultureinfo]::GetCultureInfo("en-GB")
$ie = New-Object -COM "InternetExplorer.Application"
$ie.Visible = $false
$ie.Navigate("file:///$input_path")
$ie.document.getElementsByTagName("a") | % {
$_.innerText -match 'Date:\s*([\w\.]+)\s*([\w\:]+)\s*Item:\s*(.*)'
$obj = New-Object psobject -Property @{
"Link" = $_.href
"Date" = $matches[1]
"Time" = $matches[2]
"Item" = $matches[3]
}
if ( $obj.Link -match '\?Feed(?:Default)?_(.*?)&' ) {
$obj | Add-Member –Type "NoteProperty" –Name "LinkValue" –Value $matches[1]
}
$obj
} #| Export-Csv $output_file -NoTypeInformation
返回错误:
You cannot call a method on a null-valued expression.
At line:12 char:38
+ $ie.document.getElementsByTagName <<<< ("a") | % {
+ CategoryInfo : InvalidOperation: (getElementsByTagName:String) [], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull
所以我很确定我可能搞砸了。 :)
答案 0 :(得分:1)
首先我建议使用-match
代替-replace
。生成的$matches
数组已包含您感兴趣的子匹配,因此无需手动创建此数组。
Get-Content $input_path | ? { $_.contains("href") } | % {
$_ -match 'href="(.*?)">Date:\s*([\w\.]+)\s*([\w\:]+)\s*Item:\s*(.*)</a>'
$obj = New-Object psobject -Property @{
"Link" = $matches[1]
"Date" = $matches[2]
"Time" = $matches[3]
"Item" = $matches[4]
}
$obj
} #| Export-Csv $output_file -NoTypeInformation
可以使用$obj.Link
从-match
中提取其他信息,然后通过Add-Member
将其添加到自定义对象中:
if ( $obj.Link -match '\?Feed(?:Default)?_(.*?)&' ) {
$obj | Add-Member –Type "NoteProperty" –Name "LinkValue" –Value $matches[1]
}
此外,由于您的输入文件可能是HTML文件,因此您应该考虑使用InternetExplorer
COM对象,这样可以更好地控制提取的标记,而不是逐行处理文件。
$ie = New-Object -COM "InternetExplorer.Application"
$ie.Visible = $false
$ie.Navigate("file:///$input_path")
while ( $ie.Busy ) { Start-Sleep -Milliseconds 100 }
$ie.document.getElementsByTagName("a") | % {
$_.innerText -match 'Date:\s*([\w\.]+)\s*([\w\:]+)\s*Item:\s*(.*)'
$obj = New-Object psobject -Property @{
"Link" = $_.href
"Date" = $matches[1]
"Time" = $matches[2]
"Item" = $matches[3]
}
if ( $obj.Link -match '\?Feed(?:Default)?_(.*?)&' ) {
$obj | Add-Member –Type "NoteProperty" –Name "LinkValue" –Value $matches[1]
}
$obj
}