如何基于某些参数生成字母串

时间:2016-10-19 13:04:02

标签: r string replace words

我有一组句子,每个句子中的单词数量不同。我需要用一串字母替换每个单词,但字母串需要基于特定的标准。例如,字母' t'只能通过字母“' i' l'' f&#39 ;;这封信' e'只能由' o替换对于字母表中的每个字母,或者' c'等等。此外,单词之间的空格需要保持完整,以及句号,撇号和标点符号的其他符号。举个例子: 原始句子:他爱狗。 与字母串联的句子:Fc tcwoz bcy。

有没有办法在R中自动执行此程序?谢谢。

补充:我需要替换大约400个句子。句子存储在数据框的变量中(数据$句子)。

1 个答案:

答案 0 :(得分:1)

更新2 :一些代码重构,添加了一个简单的回退策略来处理丢失的字符(因此我们可以对给定字符串中的所有字符进行编码,即使我们没有准确的字符也是如此)到一个映射),并在一个字符串向量上添加了示例循环。

# we define two different strings to be encode
mystrings <- c('bye', 'BYE')

# the dictionary with the replacements for each letter
# for the lowercase letters we are defining the exact entries
replacements <- {}
replacements['a'] <- 'xy'
replacements['b'] <- 'zp'
replacements['c'] <- '91'
# ... 
replacements['e'] <- 'xyv'
replacements['y'] <- 'opj'

# then we define a generic "fallback" entry
# to be used when we have no clues on how to encode a 'new' character
replacements['fallback'] <- '2345678'


# string, named vector -> character
# returns a single character chosen at random from the dictionary
get_random_entry <- function(entry, dictionary) {

  value <- dictionary[entry]

  # if we don't know how to encode it, use the fallback
  if (is.na(value)) {
    value <- dictionary['fallback']
  }

  # possible replacement for the current character
  possible.replacements <- strsplit(value[[1]], '')[[1]]

  # the actual replacement
  result <- sample(possible.replacements, 1)

  return(result)
}

# string, named vector -> string
# encode the given string, using the given named vector as dictionary
encode <- function(s, dictionary) {

  # get the actual subsitutions 
  substitutions <- sapply (strsplit(s,'')[[1]], function(ch) {

    # for each char in the string 's'
    # we collect the respective encoded version
    return(get_random_entry(ch, dictionary))

  }, USE.NAMES = F,simplify = T);

  # paste the resulting vector into a single string
  result <- paste(substitutions, collapse = '')

  # and return it
  return(result);
}

# we can use sapply to process all the strings defined in mystrings
# for 'bye' we know how to translate
# for 'BYE' we don't know; we'll use the fallback entry
encoded_strings <- sapply(mystrings, function(s) {
                                        # encode a single string
                                        encode(s, replacements)
                                     }, USE.NAMES =  F)

encoded_strings