我正在遵循BQML tutorial,该指南介绍了如何根据婴儿的性别,怀孕的时间以及有关母亲的人口统计信息来预测孩子的出生体重。
执行用于评估模型的SQL时,BigQuery会出现以下错误:
Failure in computing PREDICT: Null value found in input.
这是评估SQL:
#standardSQL
SELECT
*
FROM
ML.EVALUATE(MODEL `bqml_tutorial.natality_model`,
(
SELECT
weight_pounds,
is_male,
gestation_weeks,
mother_age,
CAST(mother_race AS STRING) AS mother_race
FROM
`bigquery-public-data.samples.natality`
WHERE
weight_pounds IS NOT NULL))
用于创建模型的SQL是:
#standardSQL
CREATE MODEL `bqml_tutorial.natality_model`
OPTIONS
(model_type='linear_reg',
input_label_cols=['weight_pounds']) AS
SELECT
weight_pounds,
is_male,
gestation_weeks,
mother_age,
CAST(mother_race AS string) AS mother_race
FROM
`bigquery-public-data.samples.natality`
WHERE
weight_pounds IS NOT NULL
AND RAND() < 0.001
很有趣,当进行预测时,它就可以正常工作。问题总是在尝试评估模型时出现。
有什么想法吗?
答案 0 :(得分:1)
为帮助您理解问题,您可以在下面运行
#standardSQL
SELECT
COUNTIF(weight_pounds IS NULL) weight_pounds_nulls,
COUNTIF(is_male IS NULL) is_male_nulls,
COUNTIF(gestation_weeks IS NULL) gestation_weeks_nulls,
COUNTIF(mother_age IS NULL) mother_age_nulls,
COUNTIF(mother_race IS NULL) mother_race_nulls
FROM (
SELECT
weight_pounds,
is_male,
gestation_weeks,
mother_age,
CAST(mother_race AS STRING) AS mother_race
FROM `bigquery-public-data.samples.natality`
WHERE weight_pounds IS NOT NULL
)
结果为
Row weight_pounds_nulls is_male_nulls gestation_weeks_nulls mother_age_nulls mother_race_nulls
1 0 0 4749775 0 9874846
因此,请在下面运行,以进行评估
#standardSQL
SELECT
*
FROM
ML.EVALUATE(MODEL `bqml_tutorial.natality_model`,
(
SELECT
weight_pounds,
is_male,
gestation_weeks,
mother_age,
CAST(mother_race AS STRING) AS mother_race
FROM `bigquery-public-data.samples.natality`
WHERE weight_pounds IS NOT NULL
AND gestation_weeks IS NOT NULL
AND mother_race IS NOT NULL
))
因此它将产生以下评估
Row mean_absolute_error mean_squared_error mean_squared_log_error median_absolute_error r2_score explained_variance
1 0.957266870271064 1.6762698039982795 0.03411192361406951 0.73998132611964 0.047271288906207354 0.04732780918772106
我认为您应该对PREDICT进行相同的调整
答案 1 :(得分:0)
BQML当前自动为您填充这些NULL。请重试使用原始数据(不使用非null过滤器)。