编码不适用于R中的searchTwitter

时间:2014-02-07 14:21:45

标签: r twitter

我对搜索某些twetts的编码有问题。以下是我的代码(验证后):

load("twitteR_credentials")
registerTwitterOAuth(twitCred)

download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile="cacert.pem")


mach_tweets = searchTwitter("bradesco", n=10, lang="pt", cainfo="cacert.pem", encoding='utf-8')

mach_text = sapply(mach_tweets, function(x) x$getText())

当我打印march_text的内容时,我得到:

 [1] "Sexta meu amor, eu amo você! (@ Bradesco Promotora) http://t.co/evFs3BnvbV"                                                                       
 [2] "RT @LeiSecaFortal: “@luciadeboraa: @LeiSecaFortal acidente entre tópic 06 e um palio na Av. Antônio sales em frente ao bradesco.transito le…"
 [3] "RT @LeiSecaFortal: “@luciadeboraa: @LeiSecaFortal acidente entre tópic 06 e um palio na Av. Antônio sales em frente ao bradesco.transito le…"
 [4] "RT @DanielSoaresmh: I'm at Bradesco (Cajazeiras, PB) http://t.co/Zl3pgZ01ND"                                                                       
 [5] "RT @EquipeManuGTeen: Quem já comprou seu ingresso pro show da @manugavassi em SP, dia 6/4 no Teatro Bradesco?"                                    
 [6] "I'm at Bradesco (Vitória da Conquista, BA) http://t.co/wmWPnRsY7z"                                                                                
 [7] "RT @proconspoficial: Bradesco não pode bloquear ou cancelar cartão de crédito de inadimplente com o banco http://t.co/zjf27oAKkK"               
 [8] "ALÔ EMBU BUAÇU! \nA Estrela lojas e o banco Bradesco agora uniram-se para facilitar sua vida. EXATAMENTE! Evite... http://t.co/nUvYQ3J2o3"       
 [9] "SERVIÇOS: No @CidadeJardimRN temos caixas eletrônicos do Banco do Brasil, Banco 24h, Caixa Econômica, Bradesco, Santander, ITAÚ E HSBC."       
 [10] "RT @estelanaime: @Bradesco se encontrar um leitor de código de barras." 

有谁知道如何解决这个编码问题?

这里有一些信息:

 sessionInfo()

R版本3.0.2(2013-09-25) 平台:i386-w64-mingw32 / i386(32位)

区域设置: [1] LC_COLLATE = Portuguese_Brazil.1252 LC_CTYPE = Portuguese_Brazil.1252 LC_MONETARY = Portuguese_Brazil.1252 LC_NUMERIC = C
[5] LC_TIME = Portuguese_Brazil.1252

附加基础包: [1] graphics grDevices utils datasets stats methods base

其他附件包:  [1] seqinr_3.0-7 wordcloud_2.4 RColorBrewer_1.0-5 Rcpp_0.11.0 tm_0.5-10 twitteR_1.1.7 rjson_0.2.13
 [8] ROAuth_0.9.3 digest_0.6.4 RCurl_1.95-4.1 bitops_1.0-6 sp_1.0-14 ggplot2_0.9.3.1

通过命名空间加载(而不是附加):  [1] colorspace_1.2-4 dichromat_2.0-0 grid_3.0.2 gtable_0.1.2 labeling_0.2 lattice_0.20-23 MASS_7.3-29 munsell_0.4.2
 [9] parallel_3.0.2 plyr_1.8 proto_0.3-10 reshape2_1.2.2 scales_0.2.3 slam_0.1-31 stringr_0.6.2 tools_3.0.2

我使用的是Windows 7和Rstudio版本0.97.336

更新:使用Linux机器它可以正常工作。

sessionInfo()

R版本3.0.2(2013-09-25) 平台:x86_64-pc-linux-gnu(64位)

区域设置:  [1] LC_CTYPE = en_US.UTF-8 LC_NUMERIC = C LC_TIME = C LC_COLLATE = C LC_MONETARY = C LC_MESSAGES = C
 [7] LC_PAPER = C LC_NAME = C LC_ADDRESS = C LC_TELEPHONE = C LC_MEASUREMENT = C LC_IDENTIFICATION = C

附加基础包: [1] stats graphics grDevices utils数据集方法库

其他附件包:  [1] twitteR_1.1.7 rjson_0.2.13 seqinr_3.0-7 tm_0.5-10 ggplot2_0.9.3.1 ROAuth_0.9.3 digest_0.6.3
 [8] RCurl_1.95-4.1 bitops_1.0-6 wordcloud_2.4 RColorBrewer_1.0-5 Rcpp_0.10.6 data.table_1.8.10 RJDBC_0.2-1
[15] rJava_0.9-4 DBI_0.2-7

通过命名空间加载(而不是附加):  [1] MASS_7.3-29 colorspace_1.2-4 dichromat_2.0-0 grid_3.0.2 gtable_0.1.2 labeling_0.2 munsell_0.4.2 parallel_3.0.2
 [9] plyr_1.8 proto_0.3-10 reshape2_1.2.2 scales_0.2.3 slam_0.1-31 stringr_0.6.2 tools_3.0.2

1 个答案:

答案 0 :(得分:1)

您是否尝试过Sys.setlocale