编辑1:
因此,我已经弄明白如何将CSV 2中的唯一标头附加到CSV 1。
$header = ($table | Get-Member -MemberType NoteProperty).Name
$header_add = ($table_add | Get-Member -MemberType NoteProperty).Name
$header_diff = $header + $header_add
$header_diff = ($header_diff | Sort-Object -Unique)
$header_diff = (Compare-Object -ReferenceObject $header -DifferenceObject $header_diff -PassThru)
$ header是来自CSV 1($ table)的标头数组。 $ header_add是CSV 2($ table_add)的标头数组。 $ header_diff在代码块的末尾包含CSV 2中的唯一标头。
据我所知,我的下一步是:
$append = ($table_add | Select-Object $header_diff)
我现在的问题是如何将这些对象附加到我的CSV 1($ table 1)对象?我并没有完全看到Add-Member以一种特别好的方式做到这一点。
原件:
这是我尝试合并的两个CSV文件的标题。
CSV 1:
Date, Name, Assigned Router, City, Country, # of Calls , Calls in , Calls out
CSV 2:
Date, Name, Assigned Router, City, Country, # of Minutes, Minutes in, Minutes out
快速了解这些文件是什么;这两个文件包含一天的一组名称的调用信息(日期列对于每一行具有相同的日期;这是因为这最终会被发送到包含所有日期的主.xlsx文件)。在Country中,所有列都包含两个文件中相同顺序的相同值。这些文件只是分开调用数和分钟数数据。我想知道是否有一种方便的方法可以将不同的列从一个CSV移动到另一个CSV。
我尝试过使用以下内容:
Import-Csv (Get-ChildItem <directory> -Include <common pattern in file pair>) | Export-Csv <output path> -NoTypeInformation
这并没有组合所有匹配的标题,然后追加唯一的标题。只有处理过的第一个文件才会保留其唯一标头。处理的第二个文件在输出中丢弃了所有这些标头和数据。第二个CSV中的共享标头数据已添加为附加行。
我描述的失败输出的示例输出:
PS > $small | Format-Table
Column_1 Column_2 Column_3
-------- -------- --------
1 a a
1 b b
1 c c
PS > $small_add | Format-Table
Column_1 Column_4 Column_5
-------- -------- --------
1 x x
1 y y
1 z z
PS > Import-Csv (Get-ChildItem ./*.* -Include "small*.csv") | Select-Object * -unique | Format-Table
Column_1 Column_2 Column_3
-------- -------- --------
1 a a
1 b b
1 c c
1
1
1
我想知道我是否可以做类似以下算法的事情:
导入-Csv CSV_1和CSV_2以分隔变量
将CSV_2标头与CSV_1标头进行比较,将CSV_2中不同的标头存储到单独的变量中
选择 - 对象所有CSV_1标题,与CSV_2标题不同
将Select-Object输出传递给Export-Csv
我唯一能想到的另一种方法是逐行进行,我会:
Import-Csv
从CSV_2
将其从自定义对象更改为Powershell用于CSV的字符串
将每行CSV_2附加到CSV_1的每一行
感觉有点不精确和不灵活(灵活性可以通过如何隔离列/标题来解决,因此附加字符串没有问题。)
答案 0 :(得分:2)
*本回答重点介绍高级抽象OO 解决方案 * OP's own solution更多地依赖于字符串处理,它有可能更快。 功能
# The input file paths.
$files = 'csv1.csv', 'csv2.csv'
$outFile = 'csvMerged.csv'
# Read the 2 CSV files into collections of custom objects.
# Note: This reads the entire files into memory.
$doc1 = Import-Csv $files[0]
$doc2 = Import-Csv $files[1]
# Determine the column (property) names that are unique to document 2.
$doc2OnlyColNames = (
Compare-Object $doc1[0].psobject.properties.name $doc2[0].psobject.properties.name |
Where-Object SideIndicator -eq '=>'
).InputObject
# Initialize an ordered hashtable that will be used to temporarily store
# each document 2 row's unique values as key-value pairs, so that they
# can be appended as properties to each document-1 row.
$htUniqueRowD2Props = [ordered] @{}
# Process the corresponding rows one by one, construct a merged output object
# for each, and export the merged objects to a new CSV file.
$i = 0
$(foreach($rowD1 in $doc1) {
# Get the corresponding row from document 2.
$rowD2 = $doc2[$i++]
# Extract the values from the unique document-2 columns and store them in the ordered
# hashtable.
foreach($pname in $doc2OnlyColNames) { $htUniqueRowD2Props.$pname = $rowD2.$pname }
# Add the properties represented by the hashtable entries to the
# document-1 row at hand and output the augmented object (-PassThru).
$rowD1 | Add-Member -NotePropertyMembers $htUniqueRowD2Props -PassThru
}) | Export-Csv -NoTypeInformation -Encoding Utf8 $outFile
要将上述内容用于测试,您可以使用以下示例输入:
# Create sample input CSV files
@'
Date,Name,Assigned Router,City,Country,# of Calls,Calls in,Calls out
dt,nm,ar,ct,cy,cc,ci,co
dt2,nm2,ar2,ct2,cy2,cc2,ci2,co2
'@ > csv1.csv
# Same column layout and data as above through column 'Country', then different.
@'
Date,Name,Assigned Router,City,Country,# of Minutes,Minutes in,Minutes out
dt,nm,ar,ct,cy,mc,mi,mo
dt2,nm2,ar2,ct2,cy2,mc2,mi2,mo2
'@ > csv2.csv
代码应在csvMerged.csv
中生成以下内容:
"Date","Name","Assigned Router","City","Country","# of Calls","Calls in","Calls out","# of Minutes","Minutes in","Minutes out"
"dt","nm","ar","ct","cy","cc","ci","co","mc","mi","mo"
"dt2","nm2","ar2","ct2","cy2","cc2","ci2","co2","mc2","mi2","mo2"
答案 1 :(得分:1)
编辑1:
# Read 2 CSVs into PowerShell CSV object
$table = Import-Csv test.csv
$table_add = Import-Csv test_add.csv
# Isolate unique headers in second CSV
$unique_headers = (Compare-Object -ReferenceObject $table[0].PSObject.Properties.Name -DifferenceObject $table_add[0].PSObject.Properties.Name | Where-Object SideIndicator -eq "=>").InputObject
# Convert CSVs to strings, with second CSV only containing unique columns
$table_str = ($table | ConvertTo-Csv -NoTypeInformation)
$table_add_str = ($table_add | Select-Object $unique_headers | ConvertTo-Csv -NoTypeInformation)
# Append CSV 2's unique columns to CSV 1
# Set line counter
$line = 0
# Concatenate CSV 2 lines to the end of CSV 1 lines until one or both are out of lines
While (($table_str[$line] -ne $null) -and ($table_add_str[$line] -ne $null)) {
If ($line -eq 0) {
$table_sum_str = $table_str[$line] + "," + $table_add_str[$line]
}
If ($line -ne 0) {
$table_sum_str = $table_sum_str + "`n" + ($table_str[$line] + "," + $table_add_str[$line])
}
$line = $line + 1
}
$table_sum_str | Set-Content -Path $outpath -Encoding UTF8
使用Measure-Command,我的机器上面的代码大部分都需要14-17毫秒才能运行。在mklement上运行Measure-Command可以有效地产生相同的时间,只需要观察它。
请注意,对于这两种解决方案,2个CSV文件中的数据必须采用相同的顺序。如果要将具有互补数据但顺序不同的2个CSV一起添加,则需要使用mklement的面向对象方法并添加机制以将数据与位置或名称进行匹配。
原件:
对于那些不想使用哈希表来执行此操作的人:
# Make sure you're in same directory as files:
# CSV 1
$table = Import-Csv test.csv
# CSV 2
$table_add = Import-Csv test_add.csv
# Get array with CSV 1 headers
$header = ($table | Get-Member -MemberType NoteProperty).Name
# Get array with CSV 2 headers
$header_add = ($table_add | Get-Member -MemberType NoteProperty).Name
# Add arrays of both headers together
$header_diff = $header + $header_add
# Sort the headers, remove duplicate headers (first couple ones), keep unique ones
$header_diff = ($header_diff | Sort-Object -Unique)
# Remove all of CSV 1's unique headers and shared headers
$header_diff = (Compare-Object -ReferenceObject $header -DifferenceObject $header_diff -PassThru)
# Generate a CSV table containing only CSV 2's unique headers
$table_diff = ($table_add | Select-Object $header_diff)
# Convert CSV 1 from a custom PSObject to a string
$table_str = ($table | Select-Object * | ConvertTo-Csv)
# Convert CSV 2 (unique headers only) from custom PSObject to a string
$table_diff_str = ($table_diff | Select-Object * | ConvertTo-Csv)
# Set line counter
$line = 0
# Set flag for if headers have been processed
$headproc = 0
# Concatenate CSV 2 lines to the end of CSV 1 lines until one or both are out of lines.
While (($table_str[$line] -ne $null) -and ($table_diff_str[$line] -ne $null)) {
If ($headproc -eq 1) {
$table_sum_str = $table_sum_str + "`n" + ($table_str[$line] + "," + $table_diff_str[$line])
}
If ($headproc -eq 0) {
$table_sum_str = $table_str[$line] + "," + $table_diff_str[$line]
$headproc = 1
}
$line = $line + 1
}
$table_sum_str | ConvertFrom-Csv | Select-Object * | Export-Csv -Path "./test_sum.csv" -Encoding UTF8 -NoTypeInformation
使用此命令和mklement0脚本之间的Measure-Command进行快速比较。
PS > Measure-Command {./self.ps1}
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 26
Ticks : 267771
TotalDays : 3.09920138888889E-07
TotalHours : 7.43808333333333E-06
TotalMinutes : 0.000446285
TotalSeconds : 0.0267771
TotalMilliseconds : 26.7771
PS > Measure-Command {./mklement.ps1}
Days : 0
Hours : 0
Minutes : 0
Seconds : 0
Milliseconds : 18
Ticks : 185058
TotalDays : 2.141875E-07
TotalHours : 5.1405E-06
TotalMinutes : 0.00030843
TotalSeconds : 0.0185058
TotalMilliseconds : 18.5058
我认为速度差异是因为我花时间创建一个单独的CSV PSObject来隔离列而不是直接比较它们。 mklement还具有保持列的顺序相同的优点。