我正在尝试使用创建的字典将许多字符串转换为数字序列。对于一个字符串:
library(tidyverse)
library(keras)
tkn <- text_tokenizer(6)
fit_text_tokenizer(tkn, c("Hi everyone this is an example"))
list("this example to numbers hi","also hi bob") %>%
texts_to_sequences(tkn,.)
返回:
[[1]]
[1] 3 1
[[2]]
[1] 1
似乎您几乎可以使用text2vec软件包来做到这一点。前半部分将是:
library(tidyverse)
library(tokenizers)
library(text2vec)
vectorizer <- itoken(c("Hi everyone this is an example"),
preprocessor = stringi::stri_trans_tolower,
tokenizer = tokenize_words,
ids = model_data[["id"]],
progressbar = FALSE) %>%
create_vocabulary()
但是从那里我无法弄清楚如何将字符串转换为数字,例如texts_to_sequence函数。我想念什么吗?