我的挑战是将输入的句子中的10和1(即单词)转换为数字10和1:
example_input <- paste0("I have ten apple and one orange")
数字可能会根据用户要求而改变,输入句子可以被标记:
my_output_toget<-paste("I have 10 apple and 1 orange")
答案 0 :(得分:5)
我们可以在replacement
中将键/值对作为gsubfn
传递,以数字替换这些单词
library(english)
library(gsubfn)
gsubfn("\\w+", setNames(as.list(1:10), as.english(1:10)), example_input)
#[1] "I have 10 apple and 1 orange"
答案 1 :(得分:2)
textclean
很容易完成此任务:
mgsub(example_input, replace_number(seq_len(10)), seq_len(10))
[1] "I have 10 apple and 1 orange"
您只需要根据数据中的最大数量来调整seq_len()
参数。
一些例子:
example_input <- c("I have one hundred apple and one orange")
mgsub(example_input, replace_number(seq_len(100)), seq_len(100))
[1] "I have 100 apple and 1 orange"
example_input <- c("I have one tousand apple and one orange")
mgsub(example_input, replace_number(seq_len(1000)), seq_len(1000))
[1] "I have 1 tousand apple and 1 orange"
如果您事先不知道最大数量,可以选择一个足够大的数字。
答案 2 :(得分:2)
我为此编写了一个R包-https://github.com/fsingletonthorn/words_to_numbers,该包应该适用于更多用例。
devtools::install_github("fsingletonthorn/words_to_numbers")
library(wordstonumbers)
example_input <- "I have ten apple and one orange"
words_to_numbers(example)
[1] "I have 10 apple and 1 orange"
它也适用于更复杂的情况,例如
words_to_numbers("The Library of Babel (by Jorge Luis Borges) describes a library that contains all possible four-hundred and ten page books made with a character set of twenty five characters (twenty two letters, as well as spaces, periods, and commas), with eighty lines per book and forty characters per line.")
#> [1] "The Library of Babel (by Jorge Luis Borges) describes a library that contains all possible 410 page books made with a character set of 25 characters (22 letters, as well as spaces, periods, and commas), with 80 lines per book and 40 characters per line."
或
words_to_numbers("300 billion, 2 hundred and 79 cats")
#> [1] "300000000279 cats"
答案 3 :(得分:1)
比阿克伦(Akrun)的答案要优雅,但要base
。
nums = c("one","two","three","four","five",
"six","seven","eight","nine","ten")
example_input <- paste0("I have ten apple and one orange")
aux = strsplit(example_input," ")[[1]]
aux[!is.na(match(aux,nums))]=na.omit(match(aux,nums))
example_output = paste(aux,collapse=" ")
example_output
[1] "I have 10 apple and 1 orange"
我们首先按空格分割,找到匹配的数字,然后根据位置(与数字本身一致)进行更改,然后再次粘贴。