我想将SQL服务器表的所有行插入到具有相同模式的Bigquery表中。 逐行插入流是非常慢的:要插入1000行,下面的代码执行大约需要10分钟。 在这段代码中,我遍历某个文件夹中的前10个文件,然后将该文件的内容插入到唯一的SQL Server表中。一旦我遍历了欲望文件,我就遍历SQL Server表(包含所有文件的所有行),然后在Bigquery表中逐行插入内容。最后我删除了那些文件并清空了sql server表
此操作非常慢。
有人可以有更好的解决方案将SQL服务器表的内容自动插入Bigquery表(通过代码)吗?例如,将sql server表中的所有内容插入到一个块中的bigquery表中(而不是逐行)。
由于
这是我的代码(在coldfusion中):
<cfsilent>
<cfinclude template="app_locals.cfm" />
<cfinclude template="act_BigqueryApiAccess.cfm" />
</cfsilent>
<!--- 1er traitement BQ: Insertion des colis traités --->
<!--- enregistrement du début du 1er traitement BQ (TShipping)--->
<cfset BigqueryTShipping_StartDate=now()>
<cfset QueryName = "InsertBigqueryLogTShippingStartDate">
<cfinclude template="qry_item.cfm">
<cfdirectory action="list" directory="#FileRoot#\_data\_Bigquery\TShipping" listinfo="all" type="file" name="FList" sort="datelastmodified">
<cfset FileList = Valuelist(FList.name)>
<cfoutput><h3>FileList: #FileList#</h3></cfoutput>
<cfif len(trim(FileList))>
<!--- traiter les 10 derniers fichiers (les MaxNbFile moins récents) --->
<cfset FileLoop = 1>
<cfloop list="#FileList#" index="FileName">
<cfset PathFile="#FileRoot#\_data\_Bigquery\TShipping\#FileName#">
<cfset QueryName = "InsertTShipping">
<cfinclude template="qry_item.cfm">
<cfset FileLoop = FileLoop+1>
<cfif FileLoop GT Attributes.MaxNbFile>
<cfbreak />
</cfif>
</cfloop>
</cfif>
<!--- instancier un objet de type (class) TableRow --->
<cfobject action="create" type="java" class="com.google.api.services.bigquery.model.TableRow" name="row">
<!--- <cfdump var="#row#"> --->
<cfset QueryName = "GetParcels">
<cfinclude template="qry_item.cfm">
<cfloop query="GetParcels">
<cfset row.set("Tracking_Date",mid(Tracking_Date,6,19))>
<cfset row.set("TShipping_ID", TShipping_ID)>
<cfset row.set("TShipping_Tracking", TShipping_Tracking)>
<cfset row.set("Shipper_ID", Shipper_ID)>
<cfset rows.setInsertId(sys.currentTimeMillis())>
<cfset rows.setJson(row)>
<cfset rowList.add(rows)>
<cfset content=rqst.setRows(rowList)>
<cfset response = bq.tabledata().insertAll(Project_ID,Dataset_ID,Table_ID, content).execute()>
</cfloop>
<!---vider la table TShipping_BQ--->
<cfset QueryName = "DeleteOldTShipping_BQParcels">
<cfinclude template="qry_item.cfm">
<!--- Suppression des fichiers traités --->
<cfif len(trim(FileList))>
<cfset TShippingFileNb=len(trim(FileList))>
<cfset FileLoop = 1>
<cfloop list="#FileList#" index="FileName">
<cfset PathFile="#FileRoot#\_data\_Bigquery\TShipping\#FileName#">
<cffile action="move" source="#PathFile#" destination="#FileRoot#\_data\_Bigquery\TShippingArchive">
<!--- <cffile action="delete" file="#PathFile#"> --->
<cfset FileLoop = FileLoop+1>
<cfif FileLoop GT Attributes.MaxNbFile>
<cfbreak />
</cfif>
</cfloop>
<cfelse>
<cfset TShippingFileNb=0>
</cfif>
<!--- enregistrement du nb de fichiers TShipping traités --->
<cfset QueryName = "InsertBigqueryLogTShippingNb">
<cfinclude template="qry_item.cfm">
<!--- enregistrement de la fin du 1er traitement BQ--->
<cfset BigqueryTShipping_EndDate=now()>
<cfset QueryName = "InsertBigqueryLogTShippingEndDate">
<cfinclude template="qry_item.cfm">
答案 0 :(得分:1)
您应该能够将insertAll()
移出循环。您可能需要插入太多记录,并且需要在此时将其批量处理。即,一旦你达到1000条记录,就插入它们并重置你的rowList数组
<cfloop query="GetParcels">
<cfset row = something()><!--- you need to re-create row for each loop or else you're updating a reference with each loop --->
<cfset row.set("Tracking_Date",mid(Tracking_Date,6,19))>
<cfset row.set("TShipping_ID", TShipping_ID)>
<cfset row.set("TShipping_Tracking", TShipping_Tracking)>
<cfset row.set("Shipper_ID", Shipper_ID)>
<cfset rows.setInsertId(sys.currentTimeMillis())>
<cfset rows.setJson(row)>
<cfset rowList.add(rows)>
</cfloop>
<cfset content=rqst.setRows(rowList)>
<cfset response = bq.tabledata().insertAll(Project_ID,Dataset_ID,Table_ID,content).execute()>
我称之为批处理的一个例子
<cfloop query="GetParcels">
<cfset row.set("Tracking_Date",mid(Tracking_Date,6,19))>
<cfset row.set("TShipping_ID", TShipping_ID)>
<cfset row.set("TShipping_Tracking", TShipping_Tracking)>
<cfset row.set("Shipper_ID", Shipper_ID)>
<cfset rows.setInsertId(sys.currentTimeMillis())>
<cfset rows.setJson(row)>
<cfset rowList.add(rows)>
<cfif arrayLen(rowList) EQ 1000>
<cfset content=rqst.setRows(rowList)>
<cfset response = bq.tabledata().insertAll(Project_ID,Dataset_ID,Table_ID,content).execute()>
<cfset rowList = []>
</cfif>
</cfloop>
<!--- add this check in case there are exactly an increment of 1000 rows --->
<cfif ! arrayIsEmpty(rowList)>
<cfset content=rqst.setRows(rowList)>
<cfset response = bq.tabledata().insertAll(Project_ID,Dataset_ID,Table_ID,content).execute()>
</cfif>