我正在使用Powershell 5.0,并且我有一个.CSV
文件,该文件包含我要搜索的列表siebelid(大约:5000),并且我想在服务器上的每个文件夹和子文件夹中搜索任何文件在文件名中包含该列表项(siebelid)。即文件名:32444167.pdf或32444167.pdf.metadata.properties.xml
示例CSV文件:
32444167,ACME,4/15/2013
27721071,ACME,4/15/2013
27721072,ACME,4/15/2013
我正在过滤*.PDF
和*.XML
。然后,我想将找到的文件复制到同一服务器上的目标文件夹。问题是,文件夹和子文件夹中有成千上万个文件。我编写的代码似乎要花很长时间才能运行几天。我不是专家,并且我还没有写出最有效的Powershell脚本。任何帮助,将不胜感激。
基本上,代码可以工作,但是在通过包含数十万个文件的文件夹进行处理时,它的运行速度非常慢。每次从列表中获取新项目时,调用Get-Childitem
似乎很有效。
$PDFExtension = '.pdf'
$XMLExtension = '.pdf.metadata.properties.xml'
$source = 'C:\Temp\CSVtoXML'
$destination = 'C:\Temp\FindFiles\' #'
$strGetDate = get-date -UFormat “%Y-%m-%d %H:%M:%S”
$log = $destination + "FileCopyLog.txt"
$FileList = import-csv “C:\Temp\FindFiles\test.csv” -Delimiter "," -Header 'siebelId', 'companyCode', 'receivedDate'
$GetFiles = @(Get-ChildItem -path $source -Recurse -File -include *.xml, *.pdf ) | select -First 100000
ForEach ($item in $FileList){
$siebelId = $($item.siebelId) + $PDFExtension
$XMLFile = $($item.siebelId) + $XMLExtension
$FilterFiles = @($GetFiles) | Where-Object {$_.name -eq $siebelId -or $_.name -eq $XMLFile} #| Out-File $destination"FileCopyLog.csv"
#write-host "Filtered Files: " $FilterFiles
ForEach ($file in $FilterFiles){
$fileBase = $file.BaseName
$fileExt = $file.Extension
write-host "file: " $fileBase$fileExt
If (-not ([string]::IsNullOrEmpty($file))) {
if(!(Test-Path -Path $Destination$fileBase$fileExt)) {
copy-item $file -destination $destination # Copies files
write-host "File: [" $file "] has Been Copied! to " $Destination `n`r -ForegroundColor yellow
$strGetDate = get-date -UFormat “%Y-%m-%d %H:%M:%S”
$LogValue = $strGetDate + ': ' + "Source: [" + $file + "] Destination: " + $Destination
Add-Content -Path $log -Value $LogValue
} else
{
write-host "File: [" $file "] already exsits in destination folder" `n`r -ForegroundColor yellow
$strGetDate = get-date -UFormat “%Y-%m-%d %H:%M:%S”
$LogValue = $strGetDate + ': ' + "File: [" + $file + "] already exsits in destination folder! "
Add-Content -Path $log -Value $LogValue
}
}else{
write-host "No File was copied!" `n`r -ForegroundColor red
}
}
}
write-host 'Script has completed' -ForegroundColor green
我正在寻找的预期结果是在几个小时内而不是几天内完成此过程。
答案 0 :(得分:1)
修改为使用“ .pdf.metadata.properties.xml”而不是XML,并通过从我们找到的文件的“基本名称”中提取“ .pdf.metadata.properties”来匹配这些内容。
修改
另外,通过生成目标文件列表,然后过滤要按fi复制的文件,可以减少脚本复制时间,从而减少复制过程中的时间
$Exts =@('.pdf','.pdf.metadata.properties.xml')
$source = 'C:\Temp\CSVtoXML'
$destination = 'C:\Temp\FindFiles\' #'
$strGetDate = get-date -UFormat “%Y-%m-%d %H:%M:%S”
$log = "$($destination)FileCopyLog.txt"
$SiebelIDFile="$($destination)test.csv"
$SiebelIDImport = import-csv $SiebelIDFile -Delimiter "," -Header 'siebelId', 'companyCode', 'receivedDate'
$SRC_Matched_Exts = $( $Exts | % { Get-ChildItem -path $source -Recurse -File -Filter $_ } )
# Presto we can filter the list using the Siebel IDs
$Results = $SRC_Matched_Exts | ? { $( $($_.basename) -replace '.pdf.metadata.properties','' ) -in $($SiebelIDImport.SiebelID) }
# Confirm results by outputting first 1000
$Results | select -first 100 | FT -property BaseName, FullName -Auto
# Get Destination Files to compare:
$Dst_Matched_Exts = $( $Exts | % { Get-ChildItem -path $Destiation -Recurse -File -Filter $_ } )
# Filter to only the Source files notin the destination:
$Src_Files_MissingFromDst = $Results | ? { $_.basename -notin $( $Dst_Matched_Exts.basename ) }
$Src_Files_AlreadyInDs = $Results | ? { $_.basename -notin $Src_Files_MissingFromDst.basename }
# Output some of the Files we won't Copy because they already exist in dst:
Write-host "
Output some of the Files we won't Copy because they already exist in dst:
$($Src_Files_AlreadyInDst | select -first 100 | FT -property BaseName, FullName -Auto | Out-String)" -ForegroundColor red
# Output some of the Files we will Copy:
Write-host "
Output some of the Files we will Copy:
$Src_Files_MissingFromDst | select -first 100 | FT -property BaseName, FullName -Auto | Out-String )" -ForegroundColor yellow
$Count=0
# Loop Files and Copy them to Destination:
$Src_Files_MissingFromDst | %{
$Count+=1
copy-item $($_.Fullname) -destination $destination # Copies files
Add-Content -Path $log -Value "$(Get-Date -UFormat `"%Y-%m-%d %H:%M:%S`")`: Source File # $Count: [$($file)] Destination: $Destination"
# Update the copy progress every 10 files
IF ( ! [bool]( $Count % 10 ) -or $Count -eq $($Src_Files_MissingFromDst.count) ) {
Write-Progress -Activity "======== Copying to $Destination" -Status "## $([math]::round( $(($Count/$($Src_Files_MissingFromDst.count))*100), 1))% Complete!" -PercentComplete $([math]::round( $(($Count/$($Src_Files_MissingFromDst.count))*100), 1))
write-host "File # $Count: [ $file ] has Been Copied to $Destination " -ForegroundColor Green
}
}
现在您可以根据匹配文件的集合来编写文件副本/移动了-使用并行进程来加快速度很有意义。
循环总是比用select语句过滤要慢,而且在命令上使用嵌入式过滤器总是比过滤结果更好的路径,因为过滤是在收集数据时在较低级别进行的。
答案 1 :(得分:0)
尝试:
$(Get-ChildItem -path $source -Recurse -File -Filter *.xml
Get-ChildItem -path $source -Recurse -File -Filter *.pdf)
答案 2 :(得分:0)
siebelID
似乎有8位数字,您可以用它来选择文件。
我不确定什么更有效:
$Filelist
中是否存在应将输出降低到加快处理速度所必需的绝对值。
以下脚本还消除了创建$LogValue
## Q:\Test\2019\08\26\SO_57658091.ps1
$source = 'Q:\Test\2019' # 'C:\Temp\CSVtoXML' #
$target = 'A:\Test\2019' # 'C:\Temp\FindFiles\' #
$log = Join-Path $target "FileCopyLog.txt"
$RE = '^(?<siebelID>\d{8})\.pdf(\.metadata\.properties\.xml)?'
$FileList = Import-Csv "C:\Temp\FindFiles\test.csv" -Header siebelId,companyCode,receivedDate
Get-ChildItem -path $source -Recurse -File -Filter '*.pdf*' |
Where-Object {($_.Name -match $RE ) -and
($Matches.siebelID -in $FileList.siebelID)} |
ForEach-Object{
if(!(Test-Path (Join-Path $target $_.Name))) {
Copy-Item $_.FullName -Destination $target # Copies files
$Copied = 'copied to {0}' -f $target
} else {
$Copied = 'present in destination'
}
$LogValue = '{0}: File: [{1}] {2}' -f (Get-Date -UFormat "%Y-%m-%d %H:%M:%S"),$_.Name,$Copied
# $LogValue # optionally output, but that slows down.
Add-Content -Path $log -Value $LogValue
}
write-host 'Script has completed' -ForegroundColor green
稍加修改的版本即可使用存储的SO脚本在我的测试文件夹中进行搜索,而该脚本恰好也具有8位数字,从而产生此FileCopyLog.txt
2019-08-26 17:46:03: File: [SO_55464728.ps1] copied to A:\Test\2019
2019-08-26 17:46:03: File: [SO_55569099.ps1] copied to A:\Test\2019
2019-08-26 17:46:03: File: [SO_55575835.cmd] copied to A:\Test\2019
2019-08-26 17:46:03: File: [SO_55575543.ps1] copied to A:\Test\2019