有人对如何自动在词组单词中引入常见拼写错误吗?
我找到了这个How to introduce typo in a string?,但我认为它有点过于笼统,因为它只是将第n个字母替换为随机字符。
我想介绍一下“常见”错别字。
关于如何执行操作的任何想法?
答案 0 :(得分:2)
出于解释的目的,我们假设您有一个String变量messages
,您想将错字引入其中。我向messages
引入错别字(和常见)的策略是将messages
中的 random 字母替换为附近的其他字母在键盘上(例如,更换a with s
或d with f
)。方法如下:
import random # random typos
message = "The quick brown fox jumped over the big red dog."
# convert the message to a list of characters
message = list(message)
typo_prob = 0.1 # percent (out of 1.0) of characters to become typos
# the number of characters that will be typos
n_chars_to_flip = round(len(message) * typo_prob)
# is a letter capitalized?
capitalization = [False] * len(message)
# make all characters lowercase & record uppercase
for i in range(len(message)):
capitalization[i] = message[i].isupper()
message[i] = message[i].lower()
# list of characters that will be flipped
pos_to_flip = []
for i in range(n_chars_to_flip):
pos_to_flip.append(random.randint(0, len(message) - 1))
# dictionary... for each letter list of letters
# nearby on the keyboard
nearbykeys = {
'a': ['q','w','s','x','z'],
'b': ['v','g','h','n'],
'c': ['x','d','f','v'],
'd': ['s','e','r','f','c','x'],
'e': ['w','s','d','r'],
'f': ['d','r','t','g','v','c'],
'g': ['f','t','y','h','b','v'],
'h': ['g','y','u','j','n','b'],
'i': ['u','j','k','o'],
'j': ['h','u','i','k','n','m'],
'k': ['j','i','o','l','m'],
'l': ['k','o','p'],
'm': ['n','j','k','l'],
'n': ['b','h','j','m'],
'o': ['i','k','l','p'],
'p': ['o','l'],
'q': ['w','a','s'],
'r': ['e','d','f','t'],
's': ['w','e','d','x','z','a'],
't': ['r','f','g','y'],
'u': ['y','h','j','i'],
'v': ['c','f','g','v','b'],
'w': ['q','a','s','e'],
'x': ['z','s','d','c'],
'y': ['t','g','h','u'],
'z': ['a','s','x'],
' ': ['c','v','b','n','m']
}
# insert typos
for pos in pos_to_flip:
# try-except in case of special characters
try:
typo_arrays = nearbykeys[message[pos]]
message[pos] = random.choice(typo_arrays)
except:
break
# reinsert capitalization
for i in range(len(message)):
if (capitalization[i]):
message[i] = message[i].upper()
# recombine the message into a string
message = ''.join(message)
# show the message in the console
print(message)