我有一个使用不同语言的文本文件。我使用“ UTF-8”对文件进行编码
d = read.csv("Text.csv",
stringsAsFactors = FALSE,
encoding = "UTF-8")
由于使用外语,所以我的文本中有一些Unicode。如何编码Unicode以读取原始语言?
d # Output of the chunk in RStudio
:ohemad: (UID 73271507)
“SHOOT FIRST ASK QUESTIONS LATER” : WHAT HAPPENS TO A UFO WHEN TRACKED ON MILITARY RADAR - Black Barth
“Shoot First Ask Questions Later” : What Happens To A UFO When Tracked on Military Radar – Mystical Shire
<U+03A4><U+03B9>e<U+03C1><U+03AF> <U+039C>e<U+03CA>s<U+03AC><U+03BD>
<U+0410><U+043B><U+043B><U+0430> <U+0411><U+0435><U+043B><U+044C><U+043A><U+0435><U+0432><U+0438><U+0447>
<U+0410><U+043D><U+0434><U+0440><U+0435><U+0439> <U+0418><U+0432><U+0430><U+043D><U+043E><U+0432>
这是一个小东西:
structure(list(author = c("-NO AUTHOR-", "# 1 NWO Hatr", ":ohemad: (UID 73271507)",
"“SHOOT FIRST ASK QUESTIONS LATER” : WHAT HAPPENS TO A UFO WHEN TRACKED ON MILITARY RADAR - Black Barth",
"“Shoot First Ask Questions Later” : What Happens To A UFO When Tracked on Military Radar – Mystical Shire",
"<U+03A4><U+03B9>e<U+03C1><U+03AF> <U+039C>e<U+03CA>s<U+03AC><U+03BD>",
"<U+0410><U+043B><U+043B><U+0430> <U+0411><U+0435><U+043B><U+044C><U+043A><U+0435><U+0432><U+0438><U+0447>",
"<U+0410><U+043D><U+0434><U+0440><U+0435><U+0439> <U+0418><U+0432><U+0430><U+043D><U+043E><U+0432>",
"<U+0410><U+0440><U+0438><U+044D><U+043B><U+044C> <U+041D><U+043E><U+0439><U+043E><U+043B><U+0430> <U+0420><U+043E><U+0434><U+0440><U+0438><U+0433><U+0435><U+0441>",
"<U+0412><U+043B><U+0430><U+0434><U+0430> <U+041A><U+0440><U+0443><U+0442><U+043E><U+0432><U+0430>"
), n = c(54L, 17L, 1L, 1L, 1L, 1L, 4L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-10L))