在控制台中查看的UTF-8内容与导出的结果不同

时间:2015-09-17 11:49:01

标签: r encoding utf-8

我正在尝试使用url_decode解码不同语言的大量网址(泰语/ viet / chinese)

编码后的网址如下所示:

click_search_search=Hanmyshop&by=pop&order=des
click_search_search=sp1114&by=pop&order=des
click_search_search=hanmyshop&by=pop&order=des
click_search_search=%C4%91%E1%BB%93ng%20h%E1%BB%93&by=pop&order=des
click_search_search=Sp1114&by=pop&order=des
click_search_search=nike&by=pop&order=des
click_search_search=%E4%BA%8C%E6%89%8B&by=pop&order=des
click_search_search=%E6%89%8B%E9%8C%B6&by=pop&order=des
click_search_search=%E5%BE%8C%E8%83%8C%E5%8C%85&by=pop&order=des
click_search_search=%E8%BF%AA%E5%A3%AB%E5%B0%BC&by=pop&order=des
click_search_search=iphone&by=pop&order=des

我在R中使用下面的代码来解码它们

url=as.vector(book1$Testing.URL)

de_url=url_decode(url)

Encoding(de_url)="UTF-8"

控制台中显示的结果

click_search_search=Hanmyshop&by=pop&order=des
click_search_search=sp1114&by=pop&order=des
click_search_search=hanmyshop&by=pop&order=des
click_search_search=đồng hồ&by=pop&order=des
click_search_search=Sp1114&by=pop&order=des
click_search_search=nike&by=pop&order=des
click_search_search=二手&by=pop&order=des
click_search_search=手錶&by=pop&order=des
click_search_search=後背包&by=pop&order=des
click_search_search=迪士尼&by=pop&order=des
click_search_search=iphone&by=pop&order=des

当我想在book1中添加一个名为“Decoded.URL”的单独列时,

book1$Decoded.URL=de_url

输入View(book1)后,结果显示与控制台不同。越南语或中文的所有字符都以<“U + 1E3”>格式

替换

我尝试使用write.table with fileEncoding="utf-8",没有帮助 - 中文字符显示正确;越南人不是。知道如何解决这个问题吗?

0 个答案:

没有答案