“BigQuery缺少密切双引号(”)“错误,但我没有上传任何引用

时间:2017-05-03 17:07:38

标签: google-apps-script google-bigquery

我正在将一些facebook帖子数据上传到BigQuery。所以我有基本信息:帖子名称,帖子信息,覆盖面,喜欢等等......

我通过删除所有"来清除所有帖子名称和帖子消息但是我仍然有以下错误:

  

file-00000000:解析从位置开始的行时检测到错误:   15934.错误:缺少关闭双引号(“)字符。(错误代码:无效)

"之外的其他内容是否会导致此错误?

我正在将数据从googlesheet导出到BQ,所以如果需要,这是我的脚本:

function BQ_fb_export() {
  var projectId = 'XXXXX';
  var fileId = 'XXXXXXX';
  var tableId = 'XXXXXXX'

  // Define our load job.
  var jobSpec = {
    configuration: {
      load: {
        destinationTable: {
          projectId: projectId,
          datasetId: 'Facebook',
          tableId: tableId
        },
        allowJaggedRows: true,
        writeDisposition: 'WRITE_TRUNCATE',
        schema: {
          fields: [
        {name: 'Page_ID', type: 'STRING'},
        {name: 'Post_ID', type: 'STRING'},
        {name: 'Post_creation_date', type: 'STRING'},
        {name: 'Post_name', type: 'STRING'},
        {name: 'Post_message', type: 'STRING'},
        {name: 'Link_to_post', type: 'STRING'},
        {name: 'Post_shared_link', type: 'STRING'},
        {name: 'Post_type', type: 'STRING'},
        {name: 'Post_reach', type: 'INTEGER'},
        {name: 'Post_organic_reach', type: 'INTEGER'},
        {name: 'Post_paid_reach', type: 'INTEGER'},
        {name: 'Post_viral_reach', type: 'INTEGER'},
        {name: 'Post_engaged_users', type: 'INTEGER'},
        {name: 'Post_likes', type: 'INTEGER'},
        {name: 'Post_shares', type: 'INTEGER'},
        {name: 'Post_comments', type: 'INTEGER'},
        {name: 'Post_link_clicks', type: 'INTEGER'},
        {name: 'Video_views', type: 'INTEGER'},
          ]
        }
      }
    }
  };

  var spreadsheet = SpreadsheetApp.openById(fileId);
  var filename = spreadsheet.getName();

  var sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName("Raw_data");
  var Row_count = sheet.getLastRow();
  var data = sheet.getDataRange().getValues();

  var csvdata = "";
  for (var row = 1; row < data.length && row < Row_count + 1; row++) {
    for (var col = 0; col < data[row].length; col++) {  
    var punctRE = /[\u2000-\u206F\u2E00-\u2E7F\\'!"#$%&()*+,\-.\/:;<=>?@\[\]^_`{|}~]/g;
    var spaceRE = /\s+/g;
      var cell = data[row][col].toString();

      if (cell.match(/http/g) && !cell.match(/www.facebook.com/g) ) {
        var cell = data[row][col].toString();
      } else if (!cell.match(/www.facebook.com/g)){
        var cell = data[row][col].toString().replace(punctRE, '').replace(spaceRE, ' ');
      }


      if (cell.indexOf(",") != -1) {
        csvdata += "\"" + cell + "\"";
      } else {
        csvdata += cell;
      }

      if (col < data[row].length - 1) {
        csvdata += ",";
      }
    }
    csvdata += "\r\n";
  }

  Logger.log(csvdata)
  var data = Utilities.newBlob(csvdata, "application/octet-stream");

  // Execute the job.
  BigQuery.Jobs.insert(jobSpec, projectId, data);
  // This example assumes there is a sheet named "first"
}

2 个答案:

答案 0 :(得分:1)

我终于发现问题是关于换行而不是像错误所暗示的双引号。所以我也从我的帖子名称和帖子列中删除了所有换行符,它完美地运行了。所以现在这是我的清洁变量:

var punctRE = /[\u2000-\u206F\u2E00-\u2E7F\\'!"#$%&()*+,\-.\/:;<=>?@\[\]^_`{|}~\r\n|\n|\r]/g;

希望它可以帮助别人!

答案 1 :(得分:1)

在我的情况下,将以下行添加到python代码中工作了(下面的完整代码) 根据文档-> allow_quoted_newlines = True。。指示是否允许在CSV文件中包含引号的数据部分包含换行符。默认值为false。

allow_quoted_newlines = True

对于bq加载命令,请添加以下内容

bq load --allow_quoted_newlines < rest of command >

用于在bigquery中加载数据的Python代码

from google.cloud import bigquery
from google.api_core.exceptions import BadRequest
client = bigquery.Client('project-id-b4f8d566')
dataset_id = 'census-ds'

dataset_ref = client.dataset(dataset_id)
job_config = bigquery.LoadJobConfig()
job_config.allow_quoted_newlines = True
job_config.schema = [
    bigquery.SchemaField("id", "INTEGER","REQUIRED"),
    bigquery.SchemaField("code", "STRING" ,"NULLABLE"),
    bigquery.SchemaField("answer", "STRING","NULLABLE")
]
job_config.skip_leading_rows = 1
# The source format defaults to CSV, so the line below is optional.
job_config.source_format = bigquery.SourceFormat.CSV
uri = "gs://mybucket/text.csv"

load_job = client.load_table_from_uri(
    uri, dataset_ref.table("census_text"), job_config=job_config
)  # API request
print("Starting job {}".format(load_job.job_id))

try:
    load_job.result()  # Waits for table load to complete.
    print("Job finished.")

    destination_table = client.get_table(dataset_ref.table("census_text"))
    print("Loaded {} rows.".format(destination_table.num_rows))
except BadRequest as e:
    for e in load_job.errors:
        print('ERROR: {}'.format(e['message']))

docs https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv#csv-options