我们有一个3000+ HTML文件的目录正在迁移到sharepoint站点,我们需要清理一些数据。
具体情况:
<?xml version="1.0" encoding="utf-8"?>
。我们计划删除该标题行。 foo1.htm
或foo.htm
。我们希望将两者都更改为http:\\sharepoint.site\home.aspx
''
。到目前为止,这是我的功能:
function scrubXMLHeader {
$srcfiles = Get-ChildItem $backupGuidePath -filter "*htm.*"
$srcfilecount = (Get-ChildItem $backupGuidePath).Count
$selfilecount = $srcfiles.Count
# Input and Ouput Path variables
$sourcePath = $backupGuidePath
$destinationPath = $workScrubPath
"Input From: $($sourcePath)" | Log $messagLog -echo
" Output To: $($destinationPath)" | Log $messageLog -echo
#
$temp01 = Get-ChildItem $sourcePath -filter "*.htm"
foreach($file in $temp01)
{
$outfile = $destinationPath + $file
$content = Get-Content $file.Fullname | ? {$_ -notmatch "<\?xml[^>]+>" }
Set-Content -path $outfile -Force -Value $content
}
}
我想为每个文档添加以下两个编辑:
-replace '("foo.htm", "", ">", "Home", "foo1.htm")', '("http:\\sharepoint.site\home.aspx", "", ">", "Home", "http:\\sharepoint.site\home.aspx")
-replace 'addButton("show",BTN_TEXT,"Show","","","","",0,0,"","","");', ''
我不确定如何将这些组合成单个语句,因此我打开文件,执行更改,保存并关闭文件,而不是三个单独的open-edit-save / close事务。我也不确定,使用所有引号和逗号,是逃避这些字符的最佳方法,或者整个字符串周围的单引号是否足够。
了解“asking regexes to parse arbitrary HTML is like asking Paris Hilton to write an operating system, it's sometimes appropriate to parse a limited, known set of HTML”,但我的工具集仅限于PowerShell,我试图了解将两条-replace
行添加到现有$content
变量的最佳方法...用花括号内的逗号分隔?彼此用管道输送?
以下是这些最佳策略吗?还是有更好的东西?
$content = Get-Content $file.Fullname | ? {$_ -notmatch "<\?xml[^>]+>",
-replace '("foo.htm", "", ">", "Home", "foo1.htm")', '("http:\\sharepoint.site\home.aspx", "", ">", "Home", "http:\\sharepoint.site\home.aspx"),
-replace 'addButton("show",BTN_TEXT,"Show","","","","",0,0,"","","");', '' }
答案 0 :(得分:2)
如果我正确地阅读了这个问题,我认为这可能会做你想要的:
$Regex0 = '<?xml version="1.0" encoding="utf-8"?> '
$Regex1 = '("foo.htm", "", ">", "Home", "foo1.htm")'
$Replace1 = '("http:\\sharepoint.site\home.aspx", "", ">", "Home", "http:\\sharepoint.site\home.aspx")'
$Regex2 = 'addButton("show",BTN_TEXT,"Show","","","","",0,0,"","","");'
foreach($file in $temp01)
{
$outfile = $destinationPath + $file
(Get-Content $file.Fullname) -notmatch $Regex0,'' -replace $Regex1,$Replace1 -replace $Regex2,'' |
Set-Content -path $outfile -Force -Value $content
}