Question

我有一个configuration.csv，它保存着这样的模板数据：

| path       | item  | value  | type |
|------------|-------|--------|------|
| some/path  | item1 | value1 | ALL  |
| some/path  | item2 | UPDATE | ALL  |
| other/path | item1 | value2 | SOME |

和customization.csv具有特定于服务的配置：

| path       | item  | value  | type |
|------------|-------|--------|------|
| some/path  | item2 | value3 | ALL  |
| new/path   | item3 | value3 | SOME |

我的目标是将它们合并，并得到如下结果：

| path       | item  | value  | type |
|------------|-------|--------|------|
| some/path  | item1 | value1 | ALL  |
| some/path  | item2 | value3 | ALL  |
| other/path | item1 | value2 | SOME |
| new/path   | item3 | value3 | SOME |

这应该添加任何新条目，并更新任何现有条目。没有任何一列可用于唯一标识-path和item都必须结合使用，因为它们保证是唯一的。

Answer 1

经过大量搜索，我认为最简单的操作条目而不重新创建管理框架的方法是通过hashtable。在此过程中，我不得不考虑两个边缘情况：

值中的其他逗号
空值

我得到的最终解决方案是：

$configuration = Import-Csv .\configuration.csv
$customization = Import-Csv .\customization.csv
$merged = New-Object System.Collections.ArrayList
$hashTable = @{}

#initializing the hashTable with the defaults
foreach ($entry in $configuration)
{
    $hashTable[$entry.path + ',' + $entry.item] = $entry.value + ',' + $entry.type
}

#updating the hashTable with customization that add or overwrite existing entries
foreach ($entry in $customization)
{
    $hashTable[$entry.path + ',' + $entry.item] = $entry.value + ',' + $entry.type
}

#the regex handles multiple commas and empty values.
#It returns an empty string before and after group so we start from 1 
foreach ($key in $hashTable.keys)
{
    $psobject = [PSCustomObject]@{
        path  = ($key -split '(.*),(.*)')[1]
        item  = ($key -split '(.*),(.*)')[2]
        value = ($hashTable[$key] -split '(.*),(.*)')[1]
        type  = ($hashTable[$key] -split '(.*),(.*)')[2]
    }
    [void] $merged.Add($psobject)
}
Write-Output $merged

导入后，我将configuration.csv转换为具有由path和value组成的键的hashTable。然后，我使用相同的hashTable对customization.csv进行同样的操作，该hashTable覆盖任何现有的key值或将它们添加为新值。

第三个循环将哈希表转换为PSCustomObject，类似于Import-Csv。我将key和value属性分别拆分，同时考虑了多个逗号和空值。
注意：正则表达式将在最后一次出现分隔符时进行分割（这里是逗号，但是您可以选择任何东西）。如果要先分割，可以使用(.*?),(.*)。在我的情况下，仅value列可以包含分隔符的实例。

如果CSV具有唯一列，则可以使用类似于this answer的解决方案。

另一种替代方法是将键设置为所有列的总和，这将过滤出CSV中的所有重复项，但是根据列中的值，拆分可能会比较棘手。

Answer 2

我建议使用Compare-Object，并且由于customization.csv中的值将永久使用-ReferenceObject中的文件值

## Q:\Test\2019\03\01\SO_54948111.ps1

$conf = Import-Csv '.\configuration.csv'
$cust = Import-Csv '.\customization.csv'

$NewData = Compare-Object -ref $cust -diff $conf -Property path,item -PassThru -IncludeEqual|
    Select-Object -Property * -ExcludeProperty SideIndicator

$NewData
$NewData |Export-Csv '.\NewData.csv' -NoTypeInformation

样本输出

> Q:\Test\2019\03\01\SO_54948111.ps1

path       item  value  type
----       ----  -----  ----
some/path  item2 value3 ALL
some/path  item1 value1 ALL
other/path item1 value2 SOME
new/path   item3 value3 SOME

Answer 3

您的想法'使用相同的hashTable来覆盖任何现有键值或将它们添加为新键值。'仅在path, item两端都是唯一的情况下才有效，因为您也会覆盖任何重复... 考虑一下此Join-Object cmdlet。

$configuration = ConvertFrom-SourceTable '

| path       | item  | value  | type |
|------------|-------|--------|------|
| some/path  | item1 | value1 | ALL  |
| some/path  | item2 | UPDATE | ALL  |
| other/path | item1 | value2 | SOME |
| other/path | item1 | value3 | ALL  |
'

$customization= ConvertFrom-SourceTable '

| path       | item  | value  | type |
|------------|-------|--------|------|
| some/path  | item2 | value3 | ALL  |
| new/path   | item3 | value3 | SOME |
| new/path   | item3 | value4 | ALL  |
'

使用Merge-Object，别名Merge，代理命令（请参阅帮助）：

$configuration | Merge $customization -on path, item

path       item  value  type
----       ----  -----  ----
some/path  item1 value1 ALL
some/path  item2 value3 ALL
other/path item1 value2 SOME
other/path item1 value3 ALL
new/path   item3 value3 SOME
new/path   item3 value4 ALL

合并两个CSV文件，同时添加新文件并覆盖现有条目

3 个答案: