将数据从Google表格导入MySQL表格

时间:2017-03-20 20:28:25

标签: google-apps-script

使用谷歌应用脚​​本将数据从谷歌表导入MySQL表。我有一个非常庞大的数据集来将谷歌表导入表格。但是,我遇到了超出最大执行时间的异常,还有其他选项来加速执行。

var address = 'database_IP_address';
var rootPwd = 'root_password';
var user = 'user_name';
var userPwd = 'user_password';
var db = 'database_name';

var root = 'root';
var instanceUrl = 'jdbc:mysql://' + address;
var dbUrl = instanceUrl + '/' + db;

function googleSheetsToMySQL() {   

  var RecId;
  var Code;
  var ProductDescription;
  var Price;

  var dbconnection = Jdbc.getConnection(dbUrl, root, rootPwd);
  var statement = dbconnection.createStatement();
  var googlesheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName('product'); 
  var data = googlesheet.getDataRange().getValues();  

  for (var i = 1; i < data.length; i++) {
  RecId = data[i][0];
  Code = data[i][1];
  ProductDescription = data[i][2];
  Price = data[i][3];

  var sql = "{call [dbo].[sp_googlesheetstotable](?,?,?,?)}";  
  statement = dbconnection.prepareCall(sql);  
  statement.setString(1, RecId);
  statement.setString(2, Code);
  statement.setString(3, ProductDescription);
  statement.setString(4, Price);
  statement.executeUpdate();  
  }

  statement.close();
  dbconnection.close();
}

使用批量执行

dbconnection.setAutoCommit(false)

for (var i = 1; i < data.length; i++) {
RecId = data[i][0];
Code = data[i][1];
ProductDescription = data[i][2];
Price = data[i][3];

var sql = "{call [dbo].[sp_googlesheetstotable](?,?,?,?)}";
statement = dbconnection.prepareCall(sql);
statement.setString(1, RecId);
statement.setString(2, Code);
statement.setString(3, ProductDescription);
statement.setString(4, Price);
statement.addBatch()
statement.executeBatch()
}

dbconnection.commit()

2 个答案:

答案 0 :(得分:0)

尝试检查此相关SO question,了解有关如何使用Apps脚本代码将数据从Google电子表格导入MySQL的一些信息。

现在,由于您的错误超出了最大执行时间异常,请记住Apps Script quotas只有一个6分钟/执行的脚本的最长执行时间。所以这意味着你超过了这个限制。

尝试检查此page以了解如何防止Google脚本超出最长执行时间限制的技术。

有关详细信息,请查看以下链接:

答案 1 :(得分:0)

我怀疑您可能已经找到解决问题的方法,但是对于所有可能像我一样偶然发现此问题的人,有一种简便的方法可以加快这些请求的速度。 OP快到了...

使用提供的代码:

function googleSheetsToMySQL() {

  var sheetName = 'name_of_google_sheet';

  var dbAddress = 'database_ip_address';
  var dbUser = 'database_user_name';
  var dbPassword = 'database_user_password';
  var dbName = 'database_name';
  var dbTableName = 'database_table_name';

  var dbURL = 'jdbc:mysql://' + dbAddress + '/' + dbName;

  // Regarding the statement used by the OP, you might find something like....
  //
  // "INSERT INTO " + dbTableName + " (recid, code, product_description, price) VALUES (?, ?, ?, ?);";
  //
  // to be more practical if you're trying to implement the OP's code, 
  // as you are unlikely to have a stored procedure named 'sp_googlesheetstotable', or may be more 
  // familiar with basic queries like INSERT, UPDATE, or SELECT

  var sql = "{call [dbo].[sp_googlesheetstotable](?,?,?,?)}";

  // The more records/requests you load into the statement object, the longer it will take to process,
  // which may mean you exceed the execution time before you can do any post processing.
  //
  // For example, you may want to record the last row you exported in the event the export must be halted
  // prematurely. You could create a series of Triggers to re-initiate the export, picking up right where
  // you left off.
  //
  // The other consideration is that you want your GAS memory utilization to remain as low as possible to
  // keep things running smoothly and quickly, so try to strike a balance that fits the data you're
  // working with.

  var maxRecordsPerBatch = 1000;

  var spreadsheet = SpreadsheetApp.getActiveSpreadsheet();
  var sheet = spreadsheet.getSheetByName(sheetName);

  var sheetData = sheet.getDataRange().getValues();

  var dbConnection = Jdbc.getConnection(dbURL, dbUser, dbPassword);

  // The following only needs to be set when you are changing the statement that needs to be prepared
  // or when you need to reset the variable.
  //
  // For example, if you were to switch to a different sheet which may have different values, columns,
  // structure, and/or target database table.

  var dbStatement = dbConnection.prepareCall(sql);

  var RecId;
  var Code;
  var ProductDescription;
  var Price;

  var recordCounter = 0;
  var lastRow;

  dbConnection.setAutoCommit(false);

  for (var i = 1; i < sheetData.length; i++) {

    lastRow = (i + 1 == sheetData.length ? true : false);

    RecId = sheetData[i][0];
    Code = sheetData[i][1];
    ProductDescription = sheetData[i][2];
    Price = sheetData[i][3];

    dbStatement.setString(1, RecId);
    dbStatement.setString(2, Code);
    dbStatement.setString(3, ProductDescription);
    dbStatement.setString(4, Price);

    // This command takes what has been set above and adds the request to the array that will be sent 
    // to the database for processing.

    dbStatement.addBatch();

    recordCounter += 1;

    if (recordCounter == maxRecordsPerBatch || lastRow)
    {
      try {
        dbStatement.executeBatch();
      }
      catch(e)
      {
        console.log('Attempted to update TABLE `' + dbTableName + '` in DB `' + dbName + '`, but the following error was returned: ' + e);
      }

      if (!lastRow)
      { // Reset vars
        dbStatement = dbConnection.prepareCall( sql ); // Better to reset this variable to avoid any potential "No operations allowed after statement closed" errors
        recordCounter = 0;
      }
    }
  }

  dbConnection.commit();
  dbConnection.close();
}

OP可能仍会超出执行时间限制(我的记录少于1万条),但是除非找不到问题行,否则应避免批量处理单个请求。

From this link

  

请记住,每次更新都添加到声明中,这一点很重要   或PreparedStatement由数据库单独执行。那   意味着,其中一些可能在其中之一失败之前就已经成功。所有   成功的语句现在已应用于数据库,但是   其余更新可能不是。这可能导致不一致   数据库中的数据。

     

为避免这种情况,您可以在JDBC内部执行批处理更新   交易。在事务内执行时,您可以确保   要么执行所有更新,要么不执行任何更新。任何成功的更新   如果其中一项更新失败,可以回滚。

替代解决方案

如果时间限制很麻烦,则可以尝试从外部访问表格中的数据。为了方便后代,我已经复制了基本说明,但是如果仍然有效,请访问该链接。

Link to source

  
      
  1. 更新composer.json以要求使用“ google / apiclient”:“ ^ 2.0”并运行composer update
  2.   
  3. https://console.developers.google.com/apis/dashboard上创建项目。
  4.   
  5. 点击启用API并启用Google Sheets API
  6.   
  7. 转到“凭据”,然后单击“创建凭据”,然后选择“服务帐户密钥”
  8.   
  9. 在下拉菜单中选择“新服务帐户”。给帐户起个名字,没关系。
  10.   
  11. 我为角色选择了Project-> Service Account Actor
  12.   
  13. 对于“密钥类型”,请选择JSON(默认值)并下载文件。该文件包含一个私钥,因此请务必小心,毕竟这是您的凭据
  14.   
  15. 最后,编辑您要访问的电子表格的共享权限,并共享View(如果只想读取文件)或Edit(如果需要读/写)对client_email地址的访问权限,您可以在JSON文件。
  16.   
<?php
require __DIR__ . '/vendor/autoload.php';


/*
 * We need to get a Google_Client object first to handle auth and api calls, etc.
 */
$client = new \Google_Client();
$client->setApplicationName('My PHP App');
$client->setScopes([\Google_Service_Sheets::SPREADSHEETS]);
$client->setAccessType('offline');

/*
 * The JSON auth file can be provided to the Google Client in two ways, one is as a string which is assumed to be the
 * path to the json file. This is a nice way to keep the creds out of the environment.
 *
 * The second option is as an array. For this example I'll pull the JSON from an environment variable, decode it, and
 * pass along.
 */
$jsonAuth = getenv('JSON_AUTH');
$client->setAuthConfig(json_decode($jsonAuth, true));

/*
 * With the Google_Client we can get a Google_Service_Sheets service object to interact with sheets
 */
$sheets = new \Google_Service_Sheets($client);

/*
 * To read data from a sheet we need the spreadsheet ID and the range of data we want to retrieve.
 * Range is defined using A1 notation, see https://developers.google.com/sheets/api/guides/concepts#a1_notation
 */
$data = [];

// The first row contains the column titles, so lets start pulling data from row 2
$currentRow = 2;

// The range of A2:H will get columns A through H and all rows starting from row 2
$spreadsheetId = getenv('SPREADSHEET_ID');
$range = 'A2:H';
$rows = $sheets->spreadsheets_values->get($spreadsheetId, $range, ['majorDimension' => 'ROWS']);
if (isset($rows['values'])) {
    foreach ($rows['values'] as $row) {
        /*
         * If first column is empty, consider it an empty row and skip (this is just for example)
         */
        if (empty($row[0])) {
            break;
        }

        $data[] = [
            'col-a' => $row[0],
            'col-b' => $row[1],
            'col-c' => $row[2],
            'col-d' => $row[3],
            'col-e' => $row[4],
            'col-f' => $row[5],
            'col-g' => $row[6],
            'col-h' => $row[7],
        ];

        /*
         * Now for each row we've seen, lets update the I column with the current date
         */
        $updateRange = 'I'.$currentRow;
        $updateBody = new \Google_Service_Sheets_ValueRange([
            'range' => $updateRange,
            'majorDimension' => 'ROWS',
            'values' => ['values' => date('c')],
        ]);
        $sheets->spreadsheets_values->update(
            $spreadsheetId,
            $updateRange,
            $updateBody,
            ['valueInputOption' => 'USER_ENTERED']
        );

        $currentRow++;
    }
}

print_r($data);
/* Output:
Array
(
    [0] => Array
        (
            [col-a] => 123
            [col-b] => test
            [col-c] => user
            [col-d] => test user
            [col-e] => usertest
            [col-f] => email@domain.com
            [col-g] => yes
            [col-h] => no
        )

    [1] => Array
        (
            [col-a] => 1234
            [col-b] => another
            [col-c] => user
            [col-d] =>
            [col-e] => another
            [col-f] => another@eom.com
            [col-g] => no
            [col-h] => yes
        )

)
 */
相关问题