我正在尝试将160gb csv文件加载到sql,我正在使用来自Github的powershell脚本,我收到此错误
IException calling "Add" with "1" argument(s): "Input array is longer than the number of columns in this table."
At C:\b.ps1:54 char:26
+ [void]$datatable.Rows.Add <<<< ($line.Split($delimiter))
+ CategoryInfo : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : DotNetMethodException
之下<# 8-faster-runspaces.ps1 #>
# Set CSV attributes
$csv = "M:\d\s.txt"
$delimiter = "`t"
# Set connstring
$connstring = "Data Source=.;Integrated Security=true;Initial Catalog=PresentationOptimized;PACKET SIZE=32767;"
# Set batchsize to 2000
$batchsize = 2000
# Create the datatable
$datatable = New-Object System.Data.DataTable
# Add generic columns
$columns = (Get-Content $csv -First 1).Split($delimiter)
foreach ($column in $columns) {
# Setup runspace pool and the scriptblock that runs inside each runspace
$pool = [RunspaceFactory]::CreateRunspacePool(1,5)
$pool.ApartmentState = "MTA"
$runspaces = @()
# Setup scriptblock. This is the workhorse. Think of it as a function.
$scriptblock = {
Param (
$bulkcopy = New-Object Data.SqlClient.SqlBulkCopy($connstring,"TableLock")
$bulkcopy.DestinationTableName = "abc"
$bulkcopy.BatchSize = $batchsize
# Start timer
$time = [System.Diagnostics.Stopwatch]::StartNew()
# Open the text file from disk and process.
$reader = New-Object System.IO.StreamReader($csv)
Write-Output "Starting insert.."
while ((($line = $reader.ReadLine()) -ne $null))
if ($datatable.rows.count % $batchsize -eq 0)
$runspace = [PowerShell]::Create()
[void]$runspace.AddArgument($datatable) # <-- Send datatable
$runspace.RunspacePool = $pool
$runspaces += [PSCustomObject]@{ Pipe = $runspace; Status = $runspace.BeginInvoke() }
# Overwrite object with a shell of itself
$datatable = $datatable.Clone() # <-- Create new datatable object
# Close the file
# Wait for runspaces to complete
while ($runspaces.Status.IsCompleted -notcontains $true) {}
# End timer
$secs = $time.Elapsed.TotalSeconds
# Cleanup runspaces
foreach ($runspace in $runspaces ) {
[void]$runspace.Pipe.EndInvoke($runspace.Status) # EndInvoke method retrieves the results of the asynchronous call
# Cleanup runspace pool
# Cleanup SQL Connections
# Done! Format output then display
$totalrows = 1000000
$rs = "{0:N0}" -f [int]($totalrows / $secs)
$rm = "{0:N0}" -f [int]($totalrows / $secs * 60)
$mill = "{0:N0}" -f $totalrows
Write-Output "$mill rows imported in $([math]::round($secs,2)) seconds ($rs rows/sec and $rm rows/min)"
答案 0 :(得分:1)
使用160 GB的输入文件会很麻烦。你无法将它真正加载到任何类型的编辑器中 - 或者至少你没有真正分析这样的数据量而没有一些严肃的自动化。
1) Split the file in about two equal chunks.
2) Try and load first chunk.
3) If successful, process the second chunk. If not, see 6).
4) Try and load second chunk.
5) If successful, the files are valid, but you got another a data quality issue. Start looking into other causes. If not, see 6).
6) If either load failed, start from the beginning and use the failed file as the input file.
7) Repeat until you narrow down the offending row(s).