Powershell:如何将一个CSV中的唯一标头合并到另一个?

时间:2017-07-13 14:20:24

标签: powershell csv

编辑1:

因此,我已经弄明白如何将CSV 2中的唯一标头附加到CSV 1。

$header = ($table | Get-Member -MemberType NoteProperty).Name
$header_add = ($table_add | Get-Member -MemberType NoteProperty).Name
$header_diff = $header + $header_add
$header_diff = ($header_diff | Sort-Object -Unique)
$header_diff = (Compare-Object -ReferenceObject $header -DifferenceObject $header_diff -PassThru)

$ header是来自CSV 1($ table)的标头数组。 $ header_add是CSV 2($ table_add)的标头数组。 $ header_diff在代码块的末尾包含CSV 2中的唯一标头。

据我所知,我的下一步是:

$append = ($table_add | Select-Object $header_diff)

我现在的问题是如何将这些对象附加到我的CSV 1($ table 1)对象?我并没有完全看到Add-Member以一种特别好的方式做到这一点。

原件:

这是我尝试合并的两个CSV文件的标题。

CSV 1:

Date, Name, Assigned Router, City, Country, # of Calls  , Calls in  , Calls out

CSV 2:

Date, Name, Assigned Router, City, Country, # of Minutes, Minutes in, Minutes out

快速了解这些文件是什么;这两个文件包含一天的一组名称的调用信息(日期列对于每一行具有相同的日期;这是因为这最终会被发送到包含所有日期的主.xlsx文件)。在Country中,所有列都包含两个文件中相同顺序的相同值。这些文件只是分开调用数和分钟数数据。我想知道是否有一种方便的方法可以将不同的列从一个CSV移动到另一个CSV。

我尝试过使用以下内容:

Import-Csv (Get-ChildItem <directory> -Include <common pattern in file pair>) | Export-Csv <output path> -NoTypeInformation

这并没有组合所有匹配的标题,然后追加唯一的标题。只有处理过的第一个文件才会保留其唯一标头。处理的第二个文件在输出中丢弃了所有这些标头和数据。第二个CSV中的共享标头数据已添加为附加行。

我描述的失败输出的示例输出:

PS > $small | Format-Table

Column_1 Column_2 Column_3
-------- -------- --------
1        a        a
1        b        b
1        c        c


PS > $small_add | Format-Table

Column_1 Column_4 Column_5
-------- -------- --------
1        x        x
1        y        y
1        z        z


PS > Import-Csv (Get-ChildItem ./*.* -Include "small*.csv") | Select-Object * -unique | Format-Table

Column_1 Column_2 Column_3
-------- -------- --------
1        a        a
1        b        b
1        c        c
1
1
1

我想知道我是否可以做类似以下算法的事情:

  1. 导入-Csv CSV_1和CSV_2以分隔变量

  2. 将CSV_2标头与CSV_1标头进行比较,将CSV_2中不同的标头存储到单独的变量中

  3. 选择 - 对象所有CSV_1标题,与CSV_2标题不同

  4. 将Select-Object输出传递给Export-Csv

  5. 我唯一能想到的另一种方法是逐行进行,我会:

    1. Import-Csv

    2. 从CSV_2

    3. 中删除所有共享列
    4. 将其从自定义对象更改为Powershell用于CSV的字符串

    5. 将每行CSV_2附加到CSV_1的每一行

    6. 感觉有点不精确和不灵活(灵活性可以通过如何隔离列/标题来解决,因此附加字符串没有问题。)

2 个答案:

答案 0 :(得分:2)

*本回答重点介绍高级抽象OO 解决方案 * OP's own solution更多地依赖于字符串处理,它有可能更快。 功能

# The input file paths.
$files = 'csv1.csv', 'csv2.csv'
$outFile = 'csvMerged.csv'

# Read the 2 CSV files into collections of custom objects.
# Note: This reads the entire files into memory.
$doc1 = Import-Csv $files[0]
$doc2 = Import-Csv $files[1]

# Determine the column (property) names that are unique to document 2.
$doc2OnlyColNames = (
  Compare-Object $doc1[0].psobject.properties.name $doc2[0].psobject.properties.name |
    Where-Object SideIndicator -eq '=>'
).InputObject

# Initialize an ordered hashtable that will be used to temporarily store
# each document 2 row's unique values as key-value pairs, so that they
# can be appended as properties to each document-1 row.
$htUniqueRowD2Props = [ordered] @{}

# Process the corresponding rows one by one, construct a merged output object
# for each, and export the merged objects to a new CSV file.
$i = 0
$(foreach($rowD1 in $doc1) {
  # Get the corresponding row from document 2.
  $rowD2 = $doc2[$i++]
  # Extract the values from the unique document-2 columns and store them in the ordered
  # hashtable.
  foreach($pname in $doc2OnlyColNames) { $htUniqueRowD2Props.$pname = $rowD2.$pname }
  # Add the properties represented by the hashtable entries to the
  # document-1 row at hand and output the augmented object (-PassThru).
  $rowD1 | Add-Member -NotePropertyMembers $htUniqueRowD2Props -PassThru
}) | Export-Csv -NoTypeInformation -Encoding Utf8 $outFile

要将上述内容用于测试,您可以使用以下示例输入:

# Create sample input CSV files
@'
Date,Name,Assigned Router,City,Country,# of Calls,Calls in,Calls out
dt,nm,ar,ct,cy,cc,ci,co
dt2,nm2,ar2,ct2,cy2,cc2,ci2,co2
'@ > csv1.csv

# Same column layout and data as above through column 'Country', then different.
@'
Date,Name,Assigned Router,City,Country,# of Minutes,Minutes in,Minutes out
dt,nm,ar,ct,cy,mc,mi,mo
dt2,nm2,ar2,ct2,cy2,mc2,mi2,mo2
'@ > csv2.csv

代码应在csvMerged.csv中生成以下内容:

"Date","Name","Assigned Router","City","Country","# of Calls","Calls in","Calls out","# of Minutes","Minutes in","Minutes out"
"dt","nm","ar","ct","cy","cc","ci","co","mc","mi","mo"
"dt2","nm2","ar2","ct2","cy2","cc2","ci2","co2","mc2","mi2","mo2"

答案 1 :(得分:1)

编辑1:

# Read 2 CSVs into PowerShell CSV object
$table = Import-Csv test.csv
$table_add = Import-Csv test_add.csv

# Isolate unique headers in second CSV
$unique_headers = (Compare-Object -ReferenceObject $table[0].PSObject.Properties.Name -DifferenceObject $table_add[0].PSObject.Properties.Name | Where-Object SideIndicator -eq "=>").InputObject

# Convert CSVs to strings, with second CSV only containing unique columns
$table_str = ($table | ConvertTo-Csv -NoTypeInformation)
$table_add_str = ($table_add | Select-Object $unique_headers | ConvertTo-Csv -NoTypeInformation)

# Append CSV 2's unique columns to CSV 1

# Set line counter
$line = 0

# Concatenate CSV 2 lines to the end of CSV 1 lines until one or both are out of lines
While (($table_str[$line] -ne $null) -and ($table_add_str[$line] -ne $null)) {
    If ($line -eq 0) {
        $table_sum_str = $table_str[$line] + "," + $table_add_str[$line]
    }
    If ($line -ne 0) {
        $table_sum_str = $table_sum_str + "`n" + ($table_str[$line] + "," + $table_add_str[$line])
    }
    $line = $line + 1
}
$table_sum_str | Set-Content -Path $outpath -Encoding UTF8

使用Measure-Command,我的机器上面的代码大部分都需要14-17毫秒才能运行。在mklement上运行Measure-Command可以有效地产生相同的时间,只需要观察它。

请注意,对于这两种解决方案,2个CSV文件中的数据必须采用相同的顺序。如果要将具有互补数据但顺序不同的2个CSV一起添加,则需要使用mklement的面向对象方法并添加机制以将数据与位置或名称进行匹配。

原件:

对于那些不想使用哈希表来执行此操作的人:

# Make sure you're in same directory as files:

# CSV 1
$table = Import-Csv test.csv
# CSV 2
$table_add = Import-Csv test_add.csv

# Get array with CSV 1 headers
$header = ($table | Get-Member -MemberType NoteProperty).Name
# Get array with CSV 2 headers
$header_add = ($table_add | Get-Member -MemberType NoteProperty).Name

# Add arrays of both headers together
$header_diff = $header + $header_add
# Sort the headers, remove duplicate headers (first couple ones), keep unique ones
$header_diff = ($header_diff | Sort-Object -Unique)
# Remove all of CSV 1's unique headers and shared headers
$header_diff = (Compare-Object -ReferenceObject $header -DifferenceObject $header_diff -PassThru)

# Generate a CSV table containing only CSV 2's unique headers
$table_diff = ($table_add | Select-Object $header_diff)

# Convert CSV 1 from a custom PSObject to a string
$table_str = ($table | Select-Object * | ConvertTo-Csv)

# Convert CSV 2 (unique headers only) from custom PSObject to a string
$table_diff_str = ($table_diff | Select-Object * | ConvertTo-Csv)

# Set line counter
$line = 0
# Set flag for if headers have been processed
$headproc = 0
# Concatenate CSV 2 lines to the end of CSV 1 lines until one or both are out of lines.
While (($table_str[$line] -ne $null) -and ($table_diff_str[$line] -ne $null)) {
  If ($headproc -eq 1) {
      $table_sum_str = $table_sum_str + "`n" + ($table_str[$line] + "," + $table_diff_str[$line])
  }
  If ($headproc -eq 0) {
      $table_sum_str = $table_str[$line] + "," + $table_diff_str[$line]
      $headproc = 1
  }
    $line = $line + 1
}
$table_sum_str | ConvertFrom-Csv | Select-Object * | Export-Csv -Path "./test_sum.csv" -Encoding UTF8 -NoTypeInformation

使用此命令和mklement0脚本之间的Measure-Command进行快速比较。

PS > Measure-Command {./self.ps1}


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 26
Ticks             : 267771
TotalDays         : 3.09920138888889E-07
TotalHours        : 7.43808333333333E-06
TotalMinutes      : 0.000446285
TotalSeconds      : 0.0267771
TotalMilliseconds : 26.7771


PS > Measure-Command {./mklement.ps1}


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 18
Ticks             : 185058
TotalDays         : 2.141875E-07
TotalHours        : 5.1405E-06
TotalMinutes      : 0.00030843
TotalSeconds      : 0.0185058
TotalMilliseconds : 18.5058

我认为速度差异是因为我花时间创建一个单独的CSV PSObject来隔离列而不是直接比较它们。 mklement还具有保持列的顺序相同的优点。