pandas-gbq库中的方法to_gbq()剥离datetime字段的毫秒和秒部分

时间:2018-03-16 14:33:55

标签: python pandas google-bigquery

我使用的是0.3.1版本。 最简单的再现方式:

import pandas as pd, pandas_gbq
data = {"date": datetime.datetime.now()} # 2018-03-16 16:09:03.230384
df = pd.DataFrame(data, index=[0]

destination_table = 'test_tables.test_datetime'
project_id = 'my-project-11111'
private_key = 'path-to-key.json'
pandas_gbq.to_gbq(
  df,
  destination_table,
  project_id,
  private_key=private_key
)

当我去检查Google BigQuery中创建的表中的值时,该值不会将Date和毫秒保持为DateTime的一部分: 2018-03-16 16:09:00.000 UTC

文档没有说明这种行为,所以我认为这是一个错误。但也许我在这里错过了什么?

1 个答案:

答案 0 :(得分:1)

我已经重现了您的问题,并决定将pandas_gbq方法与经典 bigquery python客户端库进行比较:

from google.cloud import bigquery
import pandas as pd, pandas_gbq, datetime
data = {"date": datetime.datetime.now()}
print data
df = pd.DataFrame(data, index=[0])
print df

destination_table = 'test_tables.test_datetime'
project_id = 'example-project'
private_key = '/home/Workspace/example-service-account.json'
pandas_gbq.to_gbq(
  df,
  destination_table,
  project_id,
  private_key=private_key,
  if_exists='append'
)
# ^What gets inserted with the above method loses the milliseconds^


# BigQuery client library attempt

bigquery_client = bigquery.Client()

dataset_id = 'test_tables'
dataset_ref = bigquery_client.dataset(dataset_id)
table_ref = dataset_ref.table('test_datetime')

df.to_csv('/home/Workspace/test.csv')
# You need to cleanup the CSV,to respect the table schema

with open('/home/Workspace/test.csv', 'rb') as source_file:
  job_config = bigquery.LoadJobConfig()
  job_config.source_format = 'text/csv'
  job = bigquery_client.load_table_from_file(source_file, table_ref, job_config=job_config)

job.result()    
print('Loaded {} rows into {}:{}.'.format(job.output_rows, dataset_id, 'test_datetime'))
# ^The timestamps inserted from the *.csv by the client library keep their millisecond information^

我认为这是pandas_gbq库中孤立的问题,因此我建议在Github issue tracker中发布/搜索它。