我有一组句子,每个句子中的单词数量不同。我需要用一串字母替换每个单词,但字母串需要基于特定的标准。例如,字母' t'只能通过字母“' i' l'' f&#39 ;;这封信' e'只能由' o替换对于字母表中的每个字母,或者' c'等等。此外,单词之间的空格需要保持完整,以及句号,撇号和标点符号的其他符号。举个例子: 原始句子:他爱狗。 与字母串联的句子:Fc tcwoz bcy。
有没有办法在R中自动执行此程序?谢谢。
补充:我需要替换大约400个句子。句子存储在数据框的变量中(数据$句子)。
答案 0 :(得分:1)
更新2 :一些代码重构,添加了一个简单的回退策略来处理丢失的字符(因此我们可以对给定字符串中的所有字符进行编码,即使我们没有准确的字符也是如此)到一个映射),并在一个字符串向量上添加了示例循环。
# we define two different strings to be encode
mystrings <- c('bye', 'BYE')
# the dictionary with the replacements for each letter
# for the lowercase letters we are defining the exact entries
replacements <- {}
replacements['a'] <- 'xy'
replacements['b'] <- 'zp'
replacements['c'] <- '91'
# ...
replacements['e'] <- 'xyv'
replacements['y'] <- 'opj'
# then we define a generic "fallback" entry
# to be used when we have no clues on how to encode a 'new' character
replacements['fallback'] <- '2345678'
# string, named vector -> character
# returns a single character chosen at random from the dictionary
get_random_entry <- function(entry, dictionary) {
value <- dictionary[entry]
# if we don't know how to encode it, use the fallback
if (is.na(value)) {
value <- dictionary['fallback']
}
# possible replacement for the current character
possible.replacements <- strsplit(value[[1]], '')[[1]]
# the actual replacement
result <- sample(possible.replacements, 1)
return(result)
}
# string, named vector -> string
# encode the given string, using the given named vector as dictionary
encode <- function(s, dictionary) {
# get the actual subsitutions
substitutions <- sapply (strsplit(s,'')[[1]], function(ch) {
# for each char in the string 's'
# we collect the respective encoded version
return(get_random_entry(ch, dictionary))
}, USE.NAMES = F,simplify = T);
# paste the resulting vector into a single string
result <- paste(substitutions, collapse = '')
# and return it
return(result);
}
# we can use sapply to process all the strings defined in mystrings
# for 'bye' we know how to translate
# for 'BYE' we don't know; we'll use the fallback entry
encoded_strings <- sapply(mystrings, function(s) {
# encode a single string
encode(s, replacements)
}, USE.NAMES = F)
encoded_strings