我有一个数据框,如下所示:
val df = Seq(("x", "y", 1),("x", "z", 2),("x", "a", 4), ("x", "a", 5), ("t", "y", 1), ("t", "y2", 6), ("t", "y3", 3), ("t", "y4", 5)).toDF("F1", "F2", "F3")
+---+---+---+
| F1| F2| F3|
+---+---+---+
| x| y| 1|
| x| z| 2|
| x| a| 4|
| x| a| 5|
| t| y| 1|
| t| y2| 6|
| t| y3| 3|
| t| y4| 5|
+---+---+---+
我正在执行过滤器和值选择,如下所示:
df.filter($"F1" === "x" && $"F2"==="y").head.getInt(2)
以上工作。但是在下面出现异常:
df.filter($"F1" === "x" && $"F2"==="y").head.getDouble(2)
此外,当过滤的数据框中没有记录时,以下内容将中断:
df.filter($"F1" === "x" && $"F2"==="y1").head.getAs[Int]("F3")
那么,如何安全地执行getAs[]()
并获取值呢?如果值
是整数或双精度数,我想始终将其设为双精度,如果
过滤的数据框为空,则应返回0.0。
答案 0 :(得分:1)
完全不要使用动态API。使用强类型API和显式转换类型:
from google.cloud import bigquery
def main():
''' Load all tables '''
client = bigquery.Client()
bq_load_file_in_gcs(
client,
'gs://bucket_name/data100rows.csv',
'CSV',
'test_data.data100_csv_native'
)
def bq_load_file_in_gcs(client, path, fmt, table_name):
'''
Load BigQuery table from Google Cloud Storage
client - bigquery client
path - 'gs://path/to/upload.file',
fmt - The format of the data files. "CSV" / "NEWLINE_DELIMITED_JSON".
https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs#configuration.load.sourceFormat
table_name - table with datasouce
'''
job_config = bigquery.LoadJobConfig()
job_config.autodetect = True
job_config.skip_leading_rows = 1
job_config.source_format = fmt
load_job = client.load_table_from_uri(
path,
table_name,
job_config=job_config
)
assert load_job.job_type == 'load'
load_job.result() # Waits for table load to complete.
assert load_job.state == 'DONE'
用法示例:
import org.apache.spark.sql.DataFrame
def get(df: DataFrame) = df.select($"F3".as[Double])
.take(1).headOption.getOrElse(0.0)