Python:自动在词组中引入一些单词拼写错误?

时间:2019-07-05 19:24:09

标签: python arrays string word

有人对如何自动在词组单词中引入常见拼写错误吗?

我找到了这个How to introduce typo in a string?,但我认为它有点过于笼统,因为它只是将第n个字母替换为随机字符。

我想介绍一下“常见”错别字。

关于如何执行操作的任何想法?

1 个答案:

答案 0 :(得分:2)

出于解释的目的,我们假设您有一个String变量messages,您想将错字引入其中。我向messages引入错别字(常见)的策略是将messages中的 random 字母替换为附近的其他字母在键盘上(例如,更换a with sd with f)。方法如下:

import random # random typos

message = "The quick brown fox jumped over the big red dog."

# convert the message to a list of characters
message = list(message)

typo_prob = 0.1 # percent (out of 1.0) of characters to become typos

# the number of characters that will be typos
n_chars_to_flip = round(len(message) * typo_prob)
# is a letter capitalized?
capitalization = [False] * len(message)
# make all characters lowercase & record uppercase
for i in range(len(message)):
    capitalization[i] = message[i].isupper()
    message[i] = message[i].lower()

# list of characters that will be flipped
pos_to_flip = []
for i in range(n_chars_to_flip):
    pos_to_flip.append(random.randint(0, len(message) - 1))

# dictionary... for each letter list of letters
# nearby on the keyboard
nearbykeys = {
    'a': ['q','w','s','x','z'],
    'b': ['v','g','h','n'],
    'c': ['x','d','f','v'],
    'd': ['s','e','r','f','c','x'],
    'e': ['w','s','d','r'],
    'f': ['d','r','t','g','v','c'],
    'g': ['f','t','y','h','b','v'],
    'h': ['g','y','u','j','n','b'],
    'i': ['u','j','k','o'],
    'j': ['h','u','i','k','n','m'],
    'k': ['j','i','o','l','m'],
    'l': ['k','o','p'],
    'm': ['n','j','k','l'],
    'n': ['b','h','j','m'],
    'o': ['i','k','l','p'],
    'p': ['o','l'],
    'q': ['w','a','s'],
    'r': ['e','d','f','t'],
    's': ['w','e','d','x','z','a'],
    't': ['r','f','g','y'],
    'u': ['y','h','j','i'],
    'v': ['c','f','g','v','b'],
    'w': ['q','a','s','e'],
    'x': ['z','s','d','c'],
    'y': ['t','g','h','u'],
    'z': ['a','s','x'],
    ' ': ['c','v','b','n','m']
}

# insert typos
for pos in pos_to_flip:
    # try-except in case of special characters
    try:
        typo_arrays = nearbykeys[message[pos]]
        message[pos] = random.choice(typo_arrays)
    except:
        break

# reinsert capitalization
for i in range(len(message)):
    if (capitalization[i]):
        message[i] = message[i].upper()

# recombine the message into a string
message = ''.join(message)

# show the message in the console
print(message)