Question

我读的文件每行包含一个单词。我对其中的一些词有疑问，因为似乎有些角色不寻常。请参阅以下示例，其中包含我的列表中的第一个单词

stopwords <- read.csv("stopwords_fr.txt",stringsAsFactors = FALSE,header=FALSE,encoding="UTF-8")$V1
stopwords[1] # "a" , if you copy paste into R studio this character with the quotes around it, you'll see a little red dot preceding the a.
stopwords[1] == "a" # FALSE

它是怎么发生的？我怎么能避免呢？如果我没有避免它，我该如何转换这个虚线＆＃34; a＆＃34;成为常规＆＃34; a＆＃34; ？

编辑：

你可以通过在Rstudio中复制粘贴来重现这个问题：

"a" == "a" # FALSE

这里是我从中获取文件的地方： https://sites.google.com/site/kevinbouge/stopwords-lists/stopwords_fr.txt?attredirects=0&d=1

根据notepad ++，文件的编码是UTF-8-BOM。但使用＆＃34; UTF-8-BOM＆＃34;因为编码没有帮助。虽然它似乎适用于这个答案： Read a UTF-8 text file with BOM

stopwords <- read.csv("stopwords_fr.txt",stringsAsFactors = FALSE,header=FALSE,encoding="UTF-8-BOM")$V1
stopwords[1] # "ï»¿a"

我有R版3.0.2

使用read.csv阅读时出现异常字符

0 个答案: