我有一个熊猫数据框,其列的日期格式如下:
PublishDate = 2018-08-31 我使用了熊猫to_gbq()函数将数据转储到bigquery表中。转储数据之前,请确保列的格式与表方案匹配。仅在bigquery表中发布日期。如何实现类似于以下内容:
df['PublishDate'] = df['PublishDate'].astype('?????')
我尝试了datetime64 [D]和
df['PublishDate'] = pd.to_datetime(df['PublishDate'], format='%Y-%m-%d', errors='coerce').dt.date
df['PublishDate'] = [time.to_date() for time in df['PublishDate']]
但是这些没有用!
答案 0 :(得分:1)
Afaik,pandas-gbq doesn't seem to have support for the DATE type。因此,最好的选择可能是将列导出为TIMESTAMP,然后通过SQL查询将其转换为DATE。
df['PublishTimestamp'] = pd.to_datetime(
df['PublishDate'],
format='%Y-%m-%d',
errors='coerce'
)
df.to_gbq("YOUR-DATASET.YOUR-TABLE", project_id="YOUR-PROJECT")
client = bigquery.Client()
job_config = bigquery.QueryJobConfig()
table_ref = client.dataset("YOUR-DATASET").table("YOUR-TABLE")
job_config.destination = ref_table
job_config.write_disposition = "WRITE_TRUNCATE"
sql = """
SELECT
*,
DATE(PublishTimestamp) as PublishDate
FROM
`YOUR-PROJECT.YOUR-DATASET.YOUR-TABLE`
"""
query_job = client.query(
sql,
job_config=job_config
)
query_job.result()
答案 1 :(得分:1)
我面临着同样的问题
发现您可以根据documentation提供
table_schema:字典列表,可选
所以在我的情况下添加
table_schema = [{'name':'execution_date','type': 'DATE'}]
工作
整行:
pdg.to_gbq(table_for_uploading, upload_table, project_id=project_id, if_exists='replace', credentials=gbq_credentials,table_schema = [{'name':'execution_date','type': 'DATE'}])
答案 2 :(得分:0)
在pandas-gbq中找不到对日期类型的支持。
另一种选择是与bigquery客户端一起插入:
vnt1 =
1,2,3,4
5,6,7,8
vnt2=
2,3,4,5
4,6,7,8
range("A1:D5") =
1,2,3,4
5,6,7,8
2,3,4,5
4,6,7,8
''''''''''''''''''''''''''''''''''''''''''''''''''
Option Explicit
Sub yougotthis()
Dim vnt1(1 To 2, 1 To 4) As Variant
Set vnt1 = [{1, 2, 3, 4;1, 2, 3, 4}]
Dim vnt2(1 To 2, 1 To 3) As Variant
Set vnt2 = [{1,2,3;1,2,3}]
Dim vnt3(1 To 2, 1 To 5) As Variant
Set vnt3 = [{1,2,3,4,5;1,2,3,4,5}]
vntAllVariants As Variant
vntAllVariants = Application.index( _
Union(vntData1, vntData2, vntData3, vntData4, vntData5, vntData6), _
Evaluate("row(1:" & vntData1.Rows.Count & ")"), _
1, _
Array(1, 2, 3, 4, 5, 6))
Range("A4:EE1000").value = dat4
'output
'1,2,3,4
'1,2,3,4
'1,2,3
'1,2,3
'1,2,3,4,5
'1,2,3,4,5
End Sub
答案 3 :(得分:0)
试试这个。这只是一种解决方法。 没有 to_gbq。
job_config = bigquery.LoadJobConfig(
schema=table_schema, source_format=bigquery.SourceFormat.CSV
)
load_job = bigquery_client.load_table_from_dataframe(
dataframe, table_id, job_config=job_config
)