Question

在R中，我想读一个只包含字符的.txt文件，但是之间没有空格。我可以根据英语字典区分R吗？例子＆＃34; oneshoulddothatthehouldalwayslearn＆＃34;输出应该是＆＃34;应该做的是他应该总是学习＆＃34;感谢

Answer 1

这是一个功能：

unmash <- function(original, sofar=c(), rest=original, words){
    for(L in 1:nchar(rest)){
        finding = substr(rest,1,L)
        m = grep(paste0("^",finding,"$"), words)
        if(length(m)>0){
            rest2 = substr(rest,L+1,nchar(rest))
            if(rest2==""){
                message("Original: ",original," = ",paste(c(sofar,finding),collapse=","))
            }else{
                unmash(original, c(sofar,finding), rest2, words)
            }
        }
    }   
}

你需要一个单词表。我得到了这个：

words = function(f){
    w = scan(f,what="")
    w = w[nchar(w)>1]
    w = c(w,"a","i","o")
    w
}
wordlist= words("/usr/share/dict/words")

其中该文件是每行一个单词的标准Unix文件。但是它几乎包含每个单个字母作为单词，因此除了a，i和o之外，上面的函数除去了大部分字母。

这是我的函数在你的例子中运行 - 请注意，我的字典中有五个可能的分词：

> test = "oneshoulddothatheshouldalwayslearn"
> unmash(test, words=wordlist)
Original: oneshoulddothatheshouldalwayslearn = one,should,do,that,he,should,always,learn
Original: oneshoulddothatheshouldalwayslearn = one,should,dot,ha,the,should,always,learn
Original: oneshoulddothatheshouldalwayslearn = one,should,dot,hat,he,should,always,learn
Original: oneshoulddothatheshouldalwayslearn = one,should,doth,a,the,should,always,learn
Original: oneshoulddothatheshouldalwayslearn = one,should,doth,at,he,should,always,learn
>

在R中，读取.txt文件，单词之间没有空格？

1 个答案: