我有2个csv文件我被要求合并,其中第一列的值匹配。两个文件都可能具有重复值,如果有,则应创建一个新行来支持这些值。如果未找到匹配项,则打印该值不匹配。
除了查找重复值,我使用以下代码...
Function GetFirstColumnNameFromFile
{
Param ($CsvFileWithPath)
$FirstFileFirstColumnTitle = ((Get-Content $CsvFileWithPath -TotalCount 2 | ConvertFrom-Csv).psobject.properties | ForEach-Object {$_.name})[0]
Write-Output $FirstFileFirstColumnTitle
}
Function CreateMergedFileWithCsv2ColumnOneColumn
{
Param ($firstColumnFirstFile, $FirstFileFirstColumnTitle, $firstFile, $secondFile, $resultsFile)
Write-Host "Creating hash table with columns values `"Csv2ColumnOne`" `"Csv2ColumnTwo`" From $secondFile"
$hashColumnOneColumnTwo2ndFile = @{}
Import-Csv $secondFile | Where-Object {$firstColumnFirstFile -contains $_.'Csv2ColumnOne'} | ForEach-Object {$hashColumnOneColumnTwo2ndFile[$_.'Csv2ColumnOne'] = $_.Csv2ColumnTwo}
Write-Host "Complete."
Write-Host "Creating Merge file with file $firstFile
and column `"Csv2ColumnTwo`" from file $secondFile"
Import-Csv $firstFile | Select-Object *, @{n='Csv2ColumnOne'; e={
if ($hashColumnOneColumnTwo2ndFile.ContainsKey($_.$FirstFileFirstColumnTitle)) {
$hashColumnOneColumnTwo2ndFile[$_.$FirstFileFirstColumnTitle]
} Else {
'Not Found'
}}} | Export-Csv $resultsFile -NoType -Force
Write-Host "Complete."
}
Function MatchFirstTwoColumnsTwoFilesAndCombineOtherColumnsOneFile
{
Param ($firstFile, $secondFile, $resultsFile)
[string]$FirstFileFirstColumnTitle = GetFirstColumnNameFromFile $firstFile
$FirstFileFirstColumn = Import-Csv $firstFile | Where-Object {$_.$FirstFileFirstColumnTitle} | Select-Object -ExpandProperty $FirstFileFirstColumnTitle
CreateMergedFileWithCsv2ColumnOneColumn $FirstFileFirstColumn $FirstFileFirstColumnTitle $firstFile $secondFile $resultsFile
}
Function Main
{
$firstFile = 'C:\Scripts\Tests\test1.csv'
$secondFile = 'C:\Scripts\Tests\test2.csv'
$resultsFile = 'C:\Scripts\Tests\testResults.csv'
MatchFirstTwoColumnsTwoFilesAndCombineOtherColumnsOneFile $firstFile $secondFile $resultsFile
}
Main
第一个csv文件的内容是:
firstName,secondName
1234,Value1
2345,Value1
3456,Value1
4567,Value1
7645,Value3
第二个csv文件的内容是:
Csv2ColumnOne,Csv2ColumnTwo,Csv2ColumnThree
1234,abc,Value1
1234,asd,Value1
3456,qwe,Value1
4567,mnb,Value1
结果是:
"firstName","secondName","Csv2ColumnOne"
"1234","Value1","asd"
"2345","Value1","Not Found"
"3456","Value1","qwe"
"4567","Value1","mnb"
"7645","Value3","Not Found"
由于第二个文件的重复值为1234
,因此结果文件应为:
"firstName","secondName","Csv2ColumnOne"
"1234","Value1","abc"
"1234","Value1","asd"
"2345","Value1","Not Found"
"3456","Value1","qwe"
"4567","Value1","mnb"
"7645","Value3","Not Found"
我有办法做到这一点吗?