使用Powershell将大型CSV导入SQL Server

时间:2018-01-18 14:32:45

标签: sql-server database excel csv bigdata

我遇到了一篇文章,讨论如何使用Powershell相对快速地批量导入海量数据。我有一个典型的csv文件,大约有500万行以通常的方式格式化。

无论我选择导入txt还是csv文件,我都会收到相同的错误消息。使用csvdelimiter / firstcolumnnames部分也会创建自己的问题。

我花了好几个小时试图弄清楚如何使用MY csv文件,无论我尝试什么,我都会收到相同的错误消息。所有字段名称都接受Null,它们在表和csv文件之间的各个方面都是相同的。我没有数据库的主键。

URLMatcher

下面列出了错误消息。

# Database variables
$sqlserver = "SERVERNAMEHERE"
$database = "autos"
$table = "AgedAutos"

# CSV variables
$csvfile = "C:\temp\aged.csv"
$csvdelimiter = "',"
$firstRowColumnNames = $true

################### No need to modify anything below ###################
Write-Host "Script started..."
$elapsed = [System.Diagnostics.Stopwatch]::StartNew() 
[void][Reflection.Assembly]::LoadWithPartialName("System.Data")
[void][Reflection.Assembly]::LoadWithPartialName("System.Data.SqlClient")

# 50k worked fastest and kept memory usage to a minimum
$batchsize = 50000

# Build the sqlbulkcopy connection, and set the timeout to infinite
$connectionstring = "Data Source=$sqlserver;Integrated Security=true;Initial Catalog=$database;"
$bulkcopy = New-Object Data.SqlClient.SqlBulkCopy($connectionstring, [System.Data.SqlClient.SqlBulkCopyOptions]::TableLock)
$bulkcopy.DestinationTableName = $table
$bulkcopy.bulkcopyTimeout = 0
$bulkcopy.batchsize = $batchsize

# Create the datatable, and autogenerate the columns.
$datatable = New-Object System.Data.DataTable

# Open the text file from disk
$reader = New-Object System.IO.StreamReader($csvfile)
$columns = (Get-Content $csvfile -First 1).Split($csvdelimiter)
if ($firstRowColumnNames -eq $true) { $null = $reader.readLine() }

foreach ($column in $columns) { 
    $null = $datatable.Columns.Add()
}

# Read in the data, line by line
while (($line = $reader.ReadLine()) -ne $null)  {
    $null = $datatable.Rows.Add($line.Split($csvdelimiter))
    $i++; if (($i % $batchsize) -eq 1) { 
        $bulkcopy.WriteToServer($datatable) 
        Write-Host "$i rows have been inserted in $($elapsed.Elapsed.ToString())."
        $datatable.Clear() 
    } 
} 

# Add in all the remaining rows since the last clear
if($datatable.Rows.Count -gt 0) {
         $bulkcopy.WriteToServer($datatable)
         $datatable.Clear()
}

# Clean Up
$reader.Close(); $reader.Dispose()
$bulkcopy.Close(); $bulkcopy.Dispose()
$datatable.Dispose()

Write-Host "Script complete. $i rows have been inserted into the database."
Write-Host "Total Elapsed Time: $($elapsed.Elapsed.ToString())"
# Sometimes the Garbage Collector takes too long to clear the huge datatable.
[System.GC]::Collect()

我不知道该错误意味着什么,因为我在谷歌上找不到任何有用的东西。我认为其中一列可能在SQL Server中列出不正确,但我可能错了。

请帮我弄清楚问题所在。谢谢。

1 个答案:

答案 0 :(得分:0)

您获取第一列中的所有数据,因为$ csvdelimiter的值不正确。 你有:$ csvdelimiter ="'," 它应该是:$ csvdelimiter =","