我想更正R中输入错误的数据。例如,如果我有一个向量
V=c('PO','PO','P0')
我希望R认识到最后一项中的0应该是o并进行更改。反正有这样做吗?我试图在R的deducorrect包中使用correctTypos。但是,我在editset上遇到了一些问题。我似乎无法指定所有条目都必须是字母。任何帮助,不胜感激。
另一个例子是
V2=c('PL','P1','PL','XX')
那个1应该是L。
答案 0 :(得分:0)
The jaro-winkler distance was developed to find issues with data entry. But on entries only 2 long that is going to be difficult as 1 error tends to score higher than you want it to. You could combine this with other distance measurements available in the stringdist package. But in this case that might be too complicated.
Given your examples you might want to use the base function chartr
and set up a replacement of numbers to letters.
chartr("01","OL", V2)
[1] "PL" "PL" "PL" "XX"
chartr("01","OL", V)
[1] "PO" "PO" "PO"
This will always replace the 1 by an L and a 0 (zero) by an O. You can add the 5 for S etc etc. But if there are other combo's it might get complicated.
Also note that the next iteration of the deducorrect package is the deductive package.