编码data.frame字符串时出现问题

时间:2015-04-17 15:48:27

标签: r string dataframe extract

我从csv文件中读取的数据框中的某些字符串存在编码问题。

这是我阅读数据框的方式:

data <- read.csv(file.choose(), colClasses = c(est_il_un_retweet="character", identifiant_du_tweet_genere_par_twitter="character", texte_du_tweet="character"))

如果我要求对有问题的列进行编码,我会得到这个:

> Encoding(data$texte_du_tweet)
[1] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
[7] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
...

使用enc2utf8()似乎不起作用:

Encoding(enc2utf8(data$texte_du_tweet))
[1] "UTF-8"   "UTF-8"   "unknown" "UTF-8"   "unknown" "unknown"
[7] "UTF-8"   "UTF-8"   "UTF-8"   "UTF-8"   "UTF-8"   "UTF-8" 
[13] "unknown" "unknown" "UTF-8"   "UTF-8"   "UTF-8"   "UTF-8"

0 个答案:

没有答案