我使用的是0.3.1版本。 最简单的再现方式:
import pandas as pd, pandas_gbq
data = {"date": datetime.datetime.now()} # 2018-03-16 16:09:03.230384
df = pd.DataFrame(data, index=[0]
destination_table = 'test_tables.test_datetime'
project_id = 'my-project-11111'
private_key = 'path-to-key.json'
pandas_gbq.to_gbq(
df,
destination_table,
project_id,
private_key=private_key
)
当我去检查Google BigQuery中创建的表中的值时,该值不会将Date和毫秒保持为DateTime的一部分:
2018-03-16 16:09:00.000 UTC
文档没有说明这种行为,所以我认为这是一个错误。但也许我在这里错过了什么?
答案 0 :(得分:1)
我已经重现了您的问题,并决定将pandas_gbq
方法与经典 bigquery
python客户端库进行比较:
from google.cloud import bigquery
import pandas as pd, pandas_gbq, datetime
data = {"date": datetime.datetime.now()}
print data
df = pd.DataFrame(data, index=[0])
print df
destination_table = 'test_tables.test_datetime'
project_id = 'example-project'
private_key = '/home/Workspace/example-service-account.json'
pandas_gbq.to_gbq(
df,
destination_table,
project_id,
private_key=private_key,
if_exists='append'
)
# ^What gets inserted with the above method loses the milliseconds^
# BigQuery client library attempt
bigquery_client = bigquery.Client()
dataset_id = 'test_tables'
dataset_ref = bigquery_client.dataset(dataset_id)
table_ref = dataset_ref.table('test_datetime')
df.to_csv('/home/Workspace/test.csv')
# You need to cleanup the CSV,to respect the table schema
with open('/home/Workspace/test.csv', 'rb') as source_file:
job_config = bigquery.LoadJobConfig()
job_config.source_format = 'text/csv'
job = bigquery_client.load_table_from_file(source_file, table_ref, job_config=job_config)
job.result()
print('Loaded {} rows into {}:{}.'.format(job.output_rows, dataset_id, 'test_datetime'))
# ^The timestamps inserted from the *.csv by the client library keep their millisecond information^
我认为这是pandas_gbq
库中孤立的问题,因此我建议在Github issue tracker中发布/搜索它。