将现有文件中的Unicode转换为可读文本(RStudio)

时间:2019-06-19 14:16:32

标签: r text

我有一个使用不同语言的文本文件。我使用“ UTF-8”对文件进行编码

d =  read.csv("Text.csv", 
              stringsAsFactors = FALSE,
              encoding = "UTF-8")

由于使用外语,所以我的文本中有一些Unicode。如何编码Unicode以读取原始语言?

d # Output of the chunk in RStudio


:ohemad: (UID 73271507) 
“SHOOT FIRST ASK QUESTIONS LATER” : WHAT HAPPENS TO A UFO WHEN TRACKED ON MILITARY RADAR - Black Barth  
“Shoot First Ask Questions Later” : What Happens To A UFO When Tracked on Military Radar – Mystical Shire   
<U+03A4><U+03B9>e<U+03C1><U+03AF> <U+039C>e<U+03CA>s<U+03AC><U+03BD>    
<U+0410><U+043B><U+043B><U+0430> <U+0411><U+0435><U+043B><U+044C><U+043A><U+0435><U+0432><U+0438><U+0447>   
<U+0410><U+043D><U+0434><U+0440><U+0435><U+0439> <U+0418><U+0432><U+0430><U+043D><U+043E><U+0432>

这是一个小东西:

structure(list(author = c("-NO AUTHOR-", "# 1 NWO Hatr", ":ohemad: (UID 73271507)", 
"“SHOOT FIRST ASK QUESTIONS LATER” : WHAT HAPPENS TO A UFO WHEN TRACKED ON MILITARY RADAR - Black Barth", 
"“Shoot First Ask Questions Later” : What Happens To A UFO When Tracked on Military Radar – Mystical Shire", 
"<U+03A4><U+03B9>e<U+03C1><U+03AF> <U+039C>e<U+03CA>s<U+03AC><U+03BD>", 
"<U+0410><U+043B><U+043B><U+0430> <U+0411><U+0435><U+043B><U+044C><U+043A><U+0435><U+0432><U+0438><U+0447>", 
"<U+0410><U+043D><U+0434><U+0440><U+0435><U+0439> <U+0418><U+0432><U+0430><U+043D><U+043E><U+0432>", 
"<U+0410><U+0440><U+0438><U+044D><U+043B><U+044C> <U+041D><U+043E><U+0439><U+043E><U+043B><U+0430> <U+0420><U+043E><U+0434><U+0440><U+0438><U+0433><U+0435><U+0441>", 
"<U+0412><U+043B><U+0430><U+0434><U+0430> <U+041A><U+0440><U+0443><U+0442><U+043E><U+0432><U+0430>"
), n = c(54L, 17L, 1L, 1L, 1L, 1L, 4L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
-10L))

0 个答案:

没有答案