我有一个很大的CSV文件,我想按大小对其进行拆分,并且标头应位于每个文件中。
例如,我有1.6MB的文件,并且我希望子文件的大小不能超过512KB。因此,实际上,父文件应具有4个子文件。 尝试使用下面的简单程序,但文件正在与空白子文件拆分。
function csvSplitter {
$csvFile = "D:\Test\PTest\Dummy.csv";
$split = 10;
$content = Import-Csv $csvFile;
$start = 1;
$end = 0;
$records_per_file = [int][Math]::Ceiling($content.Count / $split);
for($i = 1; $i -le $split; $i++) {
$end += $records_per_file;
$content | Where-Object {[int]$_.Id -ge $start -and [int]$_.Id -le $end} | Export-Csv -Path "D:\Test\PTest\Destination\file$i.csv" -NoTypeInformation;
$start = $end + 1;
}
}csvSplitter
文件大小的逻辑尚未编写。
试图添加两个文件,但我想没有添加文件的选项。
答案 0 :(得分:1)
这与解决方案略有不同。 [咧嘴]
它...
采用这种回旋方法的原因是为了节省RAM。将文件加载为CSV的一个缺点是所需的RAM数量过多。仅加载文本行就需要更少的RAM。
$SourceDir = $env:TEMP
$InFileName = 'LargeFile.csv'
$InFullFileName = Join-Path -Path $SourceDir -ChildPath $InFileName
$BatchCount = 4
$DestDir = $env:TEMP
$OutFileName = 'LF_Batch_.csv'
$OutFullFileName = Join-Path -Path $DestDir -ChildPath $OutFileName
#region >>> build file to work with
# remove this region when you are ready to do this with your test data OR to do this with real data
if (-not (Test-Path -LiteralPath $InFullFileName))
{
Get-ChildItem -LiteralPath $env:APPDATA -Recurse -File |
Sort-Object -Property Name |
Select-Object Name, Length, LastWriteTime, Directory |
Export-Csv -LiteralPath $InFullFileName -NoTypeInformation
}
#endregion >>> build file to work with
$CsvAsText = Get-Content -LiteralPath $InFullFileName
[array]$HeaderLine = $CsvAsText[0]
$BatchSize = [int]($CsvAsText.Count / $BatchCount) + 1
$StartLine = 1
foreach ($B_Index in 1..$BatchCount)
{
if ($B_Index -ne 1)
{
$StartLine = $StartLine + $BatchSize + 1
}
$CurrentOutFullFileName = $OutFullFileName.Replace('_.', ('_{0}.' -f $B_Index))
$HeaderLine + $CsvAsText[$StartLine..($StartLine + $BatchSize)] |
Set-Content -LiteralPath $CurrentOutFullFileName
}
屏幕上没有输出,但是我得到了名为LF_Batch_1.csv
至LF_Batch_4.csv
的4个文件,它们按预期包含源文件的4our部分。最后一个文件的行数稍微少一些,但是当行数不能被批数均分时,会发生这种情况。 [咧嘴]
答案 1 :(得分:0)
尝试一下:
Add-Type -AssemblyName System.Collections
function Split-Csv {
param (
[string]$filePath,
[int]$partsNum
)
# Use generic lists for import/export
[System.Collections.Generic.List[object]]$contentImport = @()
[System.Collections.Generic.List[object]]$contentExport = @()
# import csv-file
$contentImport = Import-Csv $filePath
# how many lines per export file
$linesPerFile = [Math]::Max( [int]($contentImport.Count / $partsNum), 1 )
# start pointer for source list
$startPointer = 0
# counter for file name
$counter = 1
# main loop
while( $startPointer -lt $contentImport.Count ) {
# clear export list
[void]$contentExport.Clear()
# determine from-to from source list to export
$endPointer = [Math]::Min( $startPointer + $linesPerFile, $contentImport.Count )
# move lines to export to export list
[void]$contentExport.AddRange( $contentImport.GetRange( $startPointer, $endPointer - $startPointer ) )
# export
$contentExport | Export-Csv -Path ($filePath.Replace('.', $counter.ToString() + '.' ) ) -NoTypeInformation -Force
# move pointer
$startPointer = $endPointer
# increase counter for filename
$counter++
}
}
Split-Csv -filePath 'test.csv' -partsNum 7
答案 2 :(得分:-1)
尝试运行this script:
$sw = new-object System.Diagnostics.Stopwatch
$sw.Start()
$FilePath = $HOME +'\Documents\Projects\ADOPT\Data8277.csv'
$SplitDir = $HOME +'\Documents\Projects\ADOPT\Split\'
CSV-FileSplitter -Path $FilePath -PartSizeBytes 35MB -SplitDir $SplitDir #-Verbose
$sw.Stop()
Write-Host "Split complete in " $sw.Elapsed.TotalSeconds "seconds"
我为大于 50GB 的文件创建了这个