我从csv文件中读取的数据框中的某些字符串存在编码问题。
这是我阅读数据框的方式:
data <- read.csv(file.choose(), colClasses = c(est_il_un_retweet="character", identifiant_du_tweet_genere_par_twitter="character", texte_du_tweet="character"))
如果我要求对有问题的列进行编码,我会得到这个:
> Encoding(data$texte_du_tweet)
[1] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
[7] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
...
使用enc2utf8()似乎不起作用:
Encoding(enc2utf8(data$texte_du_tweet))
[1] "UTF-8" "UTF-8" "unknown" "UTF-8" "unknown" "unknown"
[7] "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8" "UTF-8"
[13] "unknown" "unknown" "UTF-8" "UTF-8" "UTF-8" "UTF-8"