Question

我们有一个基于以下内容的数据处理管道：

pandas（0.23.4）
google.cloud.bigquery（1.5.0）和
pyarrow（0.10.0）

为防止类型转换等问题，我正在调查内部情况。

从源代码中，我可以看到使用bigquery.client.load_table_from_dataframe()加载数据帧使用了镶木地板作为中间格式。 dateframe.to_parquet被调用，缓冲区被加载到bigquery中。

我不知道它是如何工作的。 bigquery.client.sql.to_dataframe()的源代码如下：

def to_dataframe(self):
    """Return a pandas DataFrame from a QueryJob

    Returns:
        A :class:`~pandas.DataFrame` populated with row data and column
        headers from the query results. The column headers are derived
        from the destination table's schema.

    Raises:
        ValueError: If the `pandas` library cannot be imported.
    """
    return self.result().to_dataframe()

我对类没有太多的Python经验，afaik这意味着to_dataframe()是在类层次结构中更高的定义的。

回溯到_AsyncJob，它是google.api_core.future.polling.PollingFuture的子级。但是我在任何地方都找不到定义。

有帮助吗？

bigquery.client.query.to_dataframe在哪里定义？

0 个答案: