目标

Question

目标

在tidymodels中使用网格搜索创建套索模型。

对Julia Silge使用与工作代码相似/相同的代码，对另一个包含数字和因子变量的数据集使用。

问题

错误消息（已解决，请参见编辑）

`x`和`y`必须具有相同的类型和长度

LASSO没有有效结果

在所有引导程序情况下，标准差均为Null。

代码

# Data
## Optionally only numeric
data = data[sapply(data, is.numeric)]

# Workflow setup
## Recipe
rec = recipe(PD ~ ., data = data) %>%
   step_dummy(all_nominal(), -all_outcomes()) %>%
   step_normalize(all_numeric())

## Preparation of the recipe
prep = rec %>% prep()

## Workflow
wf = workflow() %>% add_recipe(rec)


# Lambda grid
lambdas = grid_regular(
   penalty(), 
   levels = 20)

# Bootstrap data
boot = bootstraps(data, times = 5)

# Model
mod = linear_reg(
   penalty = tune(), 
   mixture = 1 # for lasso
) %>% set_engine('glmnet')

# Processing
lasso = tune_grid(
   wf %>% add_model(model), 
   resamples = boot, 
   grid = lambdas)

错误追溯

tune_grid（wf％>％add_model（model）， resamples = boot，grid = lambdas）
tune_grid.workflow（wf％>％add_model（model）， resamples = boot，grid = lambdas）
tune_grid_workflow（对象，重采样=重采样，网格=网格，度量=度量，pset = param_info，控制=控制）
rlang :: eval_tidy（code_path）
tune_mod_with_recipe（重新采样，网格，对象，指标，控制）
pull_metrics（重新采样，结果，对照）
皮带轮（重新采样，分辨率，“。metrics”）
full_join（重新采样，pull_vals，= id_cols）
full_join.tbl_df（重采样，pull_vals，通过= id_cols）
`names <-`（` tmp `，value = vars $ alias）
`names <-。rset`（` tmp `，value = vars $ alias）
rset_reconstruct（out，x）
rset_reconstructable（x，to）
col_equals_splits（to_names）
vec_equal（x，“ splits”）

注释

提供一个特定的lambda并且数据合适时，不会发生错误。

如果网格提供的lambda不能正确拟合，如何更改网格？

仅使用数字预测变量会导致相同的错误。

编辑

错误消息，“ x”和“ y”必须具有相同的类型和长度，可以避免。最初，使用dplyr和rsample版本0.8.5和0.0.7。将rsample降级为0.0.6或将dplyr升级为1.0.0解决了该问题（贷记给Max Kuhn）。
LASSO仍然找不到合适的位置。

# Package imports ------
library(readr)
library(tidymodels)
#> ── Attaching packages ──────────────────────────────────────── tidymodels 0.1.0 ──
#> ✓ broom     0.5.6      ✓ recipes   0.1.12
#> ✓ dials     0.0.6      ✓ rsample   0.0.7 
#> ✓ dplyr     1.0.0      ✓ tibble    3.0.1 
#> ✓ ggplot2   3.3.1      ✓ tune      0.1.0 
#> ✓ infer     0.5.1      ✓ workflows 0.1.1 
#> ✓ parsnip   0.1.1      ✓ yardstick 0.0.6 
#> ✓ purrr     0.3.4
#> ── Conflicts ─────────────────────────────────────────── tidymodels_conflicts() ──
#> x purrr::discard()  masks scales::discard()
#> x dplyr::filter()   masks stats::filter()
#> x dplyr::lag()      masks stats::lag()
#> x ggplot2::margin() masks dials::margin()
#> x yardstick::spec() masks readr::spec()
#> x recipes::step()   masks stats::step()
library(reprex)

# Data ------
# Prepared according to the Blog post by Julia Silge
# https://juliasilge.com/blog/lasso-the-office/
urlfile = 'https://raw.githubusercontent.com/shudras/office_data/master/office_data.csv'
office = read_csv(url(urlfile))[-1]
#> Warning: Missing column names filled in: 'X1' [1]
#> Parsed with column specification:
#> cols(
#>   .default = col_double()
#> )
#> See spec(...) for full column specifications.

#office_split = initial_split(office, strata = season)
#office_train = training(office_split)
#office_test = testing(office_split)

# Lasso modeling -------
## Recipe and train it 
office_rec <- recipe(imdb_rating ~ ., data = office) %>%
  #
  step_zv(all_numeric(), -all_outcomes()) %>%
  step_normalize(all_numeric(), -all_outcomes()) %>%
  prep(strings_as_factors = FALSE) # Training

## Create workflow 
wf <- workflow() %>%
  add_recipe(office_rec)

## Parameter tuning 
set.seed(4653)
### Bootstrapping data for resampling
office_boot <- bootstraps(office, times = 5, strata = season)

### Create lambda seach gird
lambda_grid <- grid_regular(penalty(), levels = 20)

### The model
tune_spec <- linear_reg(penalty = tune(), mixture = 1) %>%
  set_engine("glmnet")

### Apply the workflow
lasso_grid <- tune_grid(
  wf %>% add_model(tune_spec),
  resamples = office_boot,
  grid = lambda_grid
)
#> ! Bootstrap1: internal: Standardabweichung ist Null
#> ! Bootstrap2: internal: Standardabweichung ist Null
#> ! Bootstrap3: internal: Standardabweichung ist Null
#> ! Bootstrap4: internal: Standardabweichung ist Null
#> ! Bootstrap5: internal: Standardabweichung ist Null

^{由reprex package（v0.3.0）于2020-06-12创建}

套索tidymodels错误：“ x”和“ y”必须具有相同的类型和长度

目标

问题

错误消息（已解决，请参见编辑）

LASSO没有有效结果

代码

错误追溯

注释

编辑

0 个答案: