获得一个数据框,其数字拼写为英文:
df.en <- data.frame(var1 = c("two thousand","one hundred", "seventyfour"),
var2 = c("twenty two","fifty", "six"), stringsAsFactors = F)
> df.en
var1 var2
1 two thousand twenty two
2 one hundred fifty
3 seventyfour six
快速&amp;脏用户定义函数来猜测是否可以转换为数字。 (下同)
将函数应用于某些元素进行测试时,工作正常:
> translateToNumbers("eighteen")
[1] 18
> translateToNumbers("one hundred and twenty two")
[1] 122
问题在于使用lapply来翻译上述数据框:
> lapply(df.en, translateToNumbers)
只返回最后评估的值
$var1
[1] 74
$var2
[1] 6
如果可能,我们需要翻译所有数据框。下面是函数:
library(magrittr)
translateToNumbers <- function(x) {
x[] <- gsub("^thousand", "1000)+", x , ignore.case = T) %>%
sub("eleven", "+11", . , ignore.case = T) %>%
gsub("twelve", "+12", . , ignore.case = T ) %>%
gsub("thirteen", "+13", . , ignore.case = T) %>%
gsub("fourteen", "+14", . , ignore.case = T) %>%
gsub("fifteen", "+15", . , ignore.case = T) %>%
gsub("sixteen", "+16", . , ignore.case = T) %>%
gsub("seventeen", "+17", . , ignore.case = T) %>%
gsub("eighteen", "+18", . , ignore.case = T) %>%
gsub("nineteen", "+19", . , ignore.case = T) %>%
gsub("twenty", "+20", . , ignore.case = T) %>%
gsub("thirty", "+30", . , ignore.case = T) %>%
gsub("forty", "+40", . , ignore.case = T) %>%
gsub("fifty", "+50", . , ignore.case = T) %>%
gsub("sixty", "+60", . , ignore.case = T) %>%
gsub("seventy", "+70", . , ignore.case = T) %>%
gsub("eighty", "+80", . , ignore.case = T) %>%
gsub("ninety", "+90", . , ignore.case = T) %>%
gsub("one hundred", "+100", . , ignore.case = T) %>%
gsub("two hundred", "+200", . , ignore.case = T) %>%
gsub("three hundred", "+300", . , ignore.case = T) %>%
gsub("four hundred", "+400", . , ignore.case = T) %>%
gsub("five hundred", "+500", . , ignore.case = T) %>%
gsub("six hundred", "+600", . , ignore.case = T) %>%
gsub("seven hundred", "+700", . , ignore.case = T) %>%
gsub("eight hundred", "+800", . , ignore.case = T) %>%
gsub("nine hundred", "+900", . , ignore.case = T) %>%
gsub("one", "+1", . , ignore.case = T) %>%
gsub("two", "+2", . , ignore.case = T) %>%
gsub("three", "+3", . , ignore.case = T) %>%
gsub("four", "+4", . , ignore.case = T) %>%
gsub("five", "+5", . , ignore.case = T) %>%
gsub("six", "+6", . , ignore.case = T) %>%
gsub("seven", "+7", . , ignore.case = T) %>%
gsub("eight", "+8", . , ignore.case = T) %>%
gsub("nine", "+9", . , ignore.case = T) %>%
gsub("millions", ")*(1000000)+(0", . , ignore.case = T) %>%
gsub("million", ")*(1000000)+(0", . , ignore.case = T) %>%
gsub("thousand", ")*(1000)+(0", . , ignore.case = T) %>%
gsub("ten", "+10", . , ignore.case = T) %>%
gsub("and", "", . , ignore.case = T) %>%
gsub(" ", "", . , ignore.case = T) %>%
gsub("^", "(0", . , ignore.case = T) %>%
gsub("$", ")", . , ignore.case = T) %>%
gsub("\\(0\\(", "", . , ignore.case = T ) %>%
gsub("\\+\\+", "\\+\\(", . , ignore.case = T ) %>%
gsub("\\)\\+\\)", "\\)", . , ignore.case = T )
return(eval(parse( text = x)))
}