如何将熊猫列转换为大查询表日期格式

时间:2018-10-12 17:10:29

标签: python datetime google-bigquery

我有一个熊猫数据框,其列的日期格式如下:

PublishDate = 2018-08-31 我使用了熊猫to_gbq()函数将数据转储到bigquery表中。转储数据之前,请确保列的格式与表方案匹配。仅在bigquery表中发布日期。如何实现类似于以下内容:

     df['PublishDate'] = df['PublishDate'].astype('?????')

我尝试了datetime64 [D]和

     df['PublishDate'] = pd.to_datetime(df['PublishDate'], format='%Y-%m-%d', errors='coerce').dt.date
     df['PublishDate'] = [time.to_date() for time in df['PublishDate']]

但是这些没有用!

4 个答案:

答案 0 :(得分:1)

Afaik,pandas-gbq doesn't seem to have support for the DATE type。因此,最好的选择可能是将列导出为TIMESTAMP,然后通过SQL查询将其转换为DATE。

df['PublishTimestamp'] = pd.to_datetime(
    df['PublishDate'],
    format='%Y-%m-%d',
    errors='coerce'
)
df.to_gbq("YOUR-DATASET.YOUR-TABLE", project_id="YOUR-PROJECT")

client = bigquery.Client()

job_config = bigquery.QueryJobConfig()
table_ref = client.dataset("YOUR-DATASET").table("YOUR-TABLE")
job_config.destination = ref_table
job_config.write_disposition = "WRITE_TRUNCATE"

sql = """
    SELECT
      *,
      DATE(PublishTimestamp) as PublishDate
    FROM
      `YOUR-PROJECT.YOUR-DATASET.YOUR-TABLE`
"""

query_job = client.query(
    sql,
    job_config=job_config
)
query_job.result()

答案 1 :(得分:1)

我面临着同样的问题

发现您可以根据documentation提供

table_schema:字典列表,可选

所以在我的情况下添加

table_schema = [{'name':'execution_date','type': 'DATE'}]

工作

整行:

 pdg.to_gbq(table_for_uploading, upload_table, project_id=project_id, if_exists='replace', credentials=gbq_credentials,table_schema = [{'name':'execution_date','type': 'DATE'}])

答案 2 :(得分:0)

在pandas-gbq中找不到对日期类型的支持。

另一种选择是与bigquery客户端一起插入:

vnt1 = 
1,2,3,4
5,6,7,8

vnt2= 
2,3,4,5
4,6,7,8

range("A1:D5") =
1,2,3,4
5,6,7,8

2,3,4,5
4,6,7,8

''''''''''''''''''''''''''''''''''''''''''''''''''
Option Explicit

Sub yougotthis()
Dim vnt1(1 To 2, 1 To 4) As Variant
Set vnt1 = [{1, 2, 3, 4;1, 2, 3, 4}]

Dim vnt2(1 To 2, 1 To 3) As Variant
Set vnt2 = [{1,2,3;1,2,3}]

Dim vnt3(1 To 2, 1 To 5) As Variant
Set vnt3 = [{1,2,3,4,5;1,2,3,4,5}]

vntAllVariants As Variant

vntAllVariants = Application.index( _
   Union(vntData1, vntData2, vntData3, vntData4, vntData5, vntData6), _
   Evaluate("row(1:" & vntData1.Rows.Count & ")"), _
   1, _
   Array(1, 2, 3, 4, 5, 6))


Range("A4:EE1000").value = dat4



'output
'1,2,3,4
'1,2,3,4

'1,2,3
'1,2,3

'1,2,3,4,5
'1,2,3,4,5


End Sub

答案 3 :(得分:0)

试试这个。这只是一种解决方法。 没有 to_gbq。

job_config = bigquery.LoadJobConfig(
    schema=table_schema, source_format=bigquery.SourceFormat.CSV
)
load_job = bigquery_client.load_table_from_dataframe(
    dataframe, table_id, job_config=job_config
)