我有一个CSV文件,其中包含以gb2313编码的中文字符(系统默认值)。 以下是我的CSV示例:
date,title,name,id,message
"2014-10-07 8:42:37","元老",879231132,879231132,"加 "
"2014-10-07 8:43:50","元老",879231132,879231132,"这么多空格,不加引号。怎么行。 "
"2014-10-07 8:45:10","新人",451635342,451635342,"想问一下,如果有一些专业词汇不懂 找谁帮忙呀? "
"2014-10-07 8:45:30","大神",532594859,532594859,"发出来,一起研究 "
我使用read.csv读取它们,并且可以在R Console中正确打印出来,但是当我尝试将值放入hChart的标签时,它显示为乱码(没有意义的字符)
我尝试Encoding(title)<- "UTF-8"
和enc2utf8()
,但他们也没有工作。
我怎么能解决这个问题?任何想法都会很有帮助
R版本3.1.1(2014-07-10)平台:i386-w64-mingw32 / i386(32位)
locale:[1] LC_COLLATE =中文(简体)_People's Republic of China.936 [2] LC_CTYPE =中文(简体)_People's Republic of China.936 [3] LC_MONETARY =中文(简体)_People's Republic of China.936 [4] LC_NUMERIC = C
[5] LC_TIME =中国(简体)_中华民国936附加基础包:[1] stats graphics grDevices utils
数据集方法基础其他附件包:[1] RJSONIO_1.3-0 httr_0.5 rCharts_0.4.5
通过命名空间加载(而不是附加):[1] grid_3.1.1
lattice_0.20-29 plyr_1.8.1 Rcpp_0.11.3 stringr_0.6.2
tools_3.1.1 [7] whisker_0.3-2 yaml_2.1.13
现在我把我的代码放在这里。
library(rCharts)
library(httr)
library(RJSONIO)
library(data.table)
paresed_data <- read.csv("gb2312.csv",header = TRUE,sep = ",",quote="\"")
get_top_n_speakers <- function(n = 50){
data <- subset(paresed_data,select = c(id,name,title))
freq_data <- data.frame(table(data$id))
colnames(freq_data) <- c("id","msg_cnt")
desc_data <- data[!duplicated(data$id),]
df <- merge(desc_data,freq_data,by="id")
set.seed(666)
random <- runif(nrow(desc_data))
df <- cbind(df,random)
df <- df[order(df$msg_cnt,decreasing = TRUE,na.last = TRUE),]
df <- head(x = df,n = n)
h2 <- hPlot(
x = "random",
y = "msg_cnt",
data = df,
type = "scatter",
title = paste("前",n,"个成员",sep=" "),
group ="title",
radius = 5
)
h2$xAxis(title = NULL,labels = list(format = " "));
h2$tooltip(useHTML = T, formatter = "#! function() {
return 'Msg count: <b>' + this.y + '</b><br> Title:<b> '+ this.series.name+'</b><br>name:<b>'+this.name+'</b>';
} !#")
h2
}
答案 0 :(得分:0)
我发现iconv对某些字母有帮助......但仍有一些胡言乱语。 `iconv(x,from =“gb2312”,to =“utf-8”)
答案 1 :(得分:-1)
如何使用'GBK'代替'gb2312'? 它运作良好。
iconv(x, from = "GBK", to = "UTF-8")